tfwillems / HipSTR

Genotype and phase short tandem repeats using Illumina whole-genome sequencing data
GNU General Public License v2.0
94 stars 31 forks source link

adapter trimming issue #76

Closed dinovski closed 4 years ago

dinovski commented 4 years ago

Individual genotyping (--def-stutter-model --min-reads 10) of 101 CEPH bams works fine (v0.6.2) but joint genotyping of the same BAMs with a comma-delimited list + de novo stutter estimation throws the following:

HipSTR: src/adapter_trimmer.cpp:172: void AdapterTrimmer::trim_adapters(BamAlignment&): Assertion `false' failed. Aborted (core dumped)

I see the same issue when I try to run HipSTR on the platinum genome crams for NA12891 & NA12892 downloaded from https://www.internationalgenome.org/data-portal/sample/NA12891 except it appears to genotype several regions successfully before crashing.

Processing region chr13 82148024 82148068 HipSTR: src/adapter_trimmer.cpp:172: void AdapterTrimmer::trim_adapters(BamAlignment&): Assertion `false' failed. Aborted (core dumped)

tfwillems commented 4 years ago

Hey @dinovski,

Thanks for reporting this issue! So if I understand correctly, you're able to reproduce this when only running on CRAMs from NA12891 and NA12892 in joint genotyping mode? If you run them individually, do you observe the issue?

Could you send me the full command you used for these 2 samples as well as a link to your region BED file? I'll try and reproduce it and see what may be driving this

Thanks! Thomas

dinovski commented 4 years ago

Thanks Thomas!

ceph trio hg38 crams were downloaded from:

https://www.internationalgenome.org/data-portal/sample/NA12891 https://www.internationalgenome.org/data-portal/sample/NA12892

The hg38 FASTA was also downloaded from 1KG

The regions file is here: http://teamerlich.org/refstr/

I want to jointly genotype all CEPH samples but these two are problematic. If I run the following (v0.6.2):

HipSTR --bams NA12891.alt_bwamem_GRCh38DH.20150706.CEU.illumina_platinum_ped.cram \ --regions hg38.markers.bed --fasta GRCh38_full_analysis_set_plus_decoy_hla.fa \ --min-reads 10 --haploid-chrs chrY --def-stutter-model --max-str-len 200 \ --str-vcf NA12891.alt_bwamem_GRCh38DH.20150706.CEU.illumina_platinum_ped.hipstr.vcf.gz

it throws this error:

Processing region chr13 82148024 82148068 HipSTR: src/adapter_trimmer.cpp:172: void AdapterTrimmer::trim_adapters(BamAlignment&): Assertion `false' failed. Aborted (core dumped)

I'm not sure why it's crashing at this region

On Tue, Apr 14, 2020 at 7:25 AM tfwillems notifications@github.com wrote:

Hey @dinovski https://github.com/dinovski,

Thanks for reporting this issue! So if I understand correctly, you're able to reproduce this when only running on CRAMs from NA12891 and NA12892 in joint genotyping mode? If you run them individually, do you observe the issue?

Could you send me the full command you used for these 2 samples as well as a link to your region BED file? I'll try and reproduce it and see what may be driving this

Thanks! Thomas

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tfwillems/HipSTR/issues/76#issuecomment-613384581, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRDSDN42S44U6AHX3CAMRTRMRBZNANCNFSM4MHFGKWA .

tfwillems commented 4 years ago

Hi @dinovski ,

Thanks again for reporting this issue! I managed to track it down - essentially, these BAMs occasionally contained unpaired reads that were neither marked as the 1st or 2nd read in the SAM flag. I had assumed that reads would always fall into one of these categories, but apparently for unpaired reads that isn't the case.

I've just released HipSTR v0.7 to address this issue.

Let me know if you have any additional questions or issues!

Best, Thomas