Closed Erythroxylum closed 2 months ago
It might be easier to tell downstream tools (haplotypecaller) to ignore the duplicate flag in the bams. I'll look into this today
Do you know how this usually gets handled in GBS pipelines? If the only switch is to remove mark duplicates, it would be easy-ish to have it as an option. In the short term, you could also just replace this line with something like cp {input.bam} {output.dedupBam} 2> {log}
then your bam file wouldn't have any duplicates
GATK is not a usual approach for GBS pipelines, but my understanding is that typically no deduplication is done because there is no good method to detect dups.
On Thu, Apr 11, 2024 at 1:10 PM Erik Enbody @.***> wrote:
Do you know how this usually gets handled in GBS pipelines? If the only switch is to remove mark duplicates, it would be easy-ish to have it as an option. In the short term, you could also just replace this line https://github.com/harvardinformatics/snpArcher/blob/730475ffdb1c0c4729b24e26fb4eb82a54ba3635/workflow/rules/fastq2bam.smk#L49 with something like cp {input.bam} {output.dedupBam} 2> {log} then your bam file wouldn't have any duplicates
— Reply to this email directly, view it on GitHub https://github.com/harvardinformatics/snpArcher/issues/172#issuecomment-2050140403, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLR5LRIQSIP2ZJX3D5CN5DY427XXAVCNFSM6AAAAABGCQUZUGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJQGE2DANBQGM . You are receiving this because you authored the thread.Message ID: @.***>
Working on adding this.
As a temp solution, if you're not using GATK and just need the BAMS w/o duplicates marked, you can try:
snakemake --until dedup --no-temp <... other options>
This will be added in #173, feel free to pull the branch to test it out
Hi @cademirch , Wow, thanks for your prompt attention! I have cloned the branch. Can you please explain what flags I need to use to keep dups?
@Erythroxylum You'll need to set mark_duplicates
to False
in the config.yaml
. Also when you run snakemake, add the --notemp
option.
merged with #173
Hi Cade et al.: FYI, the mark_duplicates flag is absent from the snpArcher/config/config.yaml. I can, however, see it in the .test/ecoli/config/config.yaml. I just appended it to my main config/config.yaml, hoping that will work.
Oops. Thanks for pointing that out, I’ll fix soon.
On Mon, Apr 15, 2024 at 08:46 Dawson White @.***> wrote:
Hi Cade et al.: FYI, the mark_duplicates flag is absent from the snpArcher/config/config.yaml. I can, however, see it in the .test/ecoli/config/config.yaml. I just appended it to my main config/config.yaml, hoping that will work.
— Reply to this email directly, view it on GitHub https://github.com/harvardinformatics/snpArcher/issues/172#issuecomment-2057175948, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKVQJ4QTTQMZRXQI7GJI5SDY5PY4VAVCNFSM6AAAAABGCQUZUGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJXGE3TKOJUHA . You are receiving this because you were mentioned.Message ID: @.***>
Hello, I am interested in using this pipeline with GBS data, but the identical start and end sequences make most reads get flagged as duplicates. Is there a simple way to disable mark duplicates or would it require significant alterations to downstream code? Thanks!