Closed mjafin closed 8 years ago
Hello @mjafin, Can you please provide us with the command you used to generate the BAM file (ie: your alignment command)? Thanks!
Here is the whole command:
bwa mem -c 250 -M -t 8 -R '@RG\tID:DNA38_S4_umi\tPL:illumina\tPU:4_2016-05-10_bcbio_align\tSM:DNA38_S4_umi' -v 1 /ngs/reference_data/genomes/Hsapiens/hg38/bwa/hg38.fa /ngs/oncology/datasets/NextSeq500/TS_UK_0013_Nugen/Unalign/DNA38_S4_umi_R1_001.fastq.gz /ngs/oncology/datasets/NextSeq500/TS_UK_0013_Nugen/Unalign/DNA38_S4_umi_R2_001.fastq.gz | /apps/bcbio-nextgen/0.9.7/rhel6-x64/anaconda/share/bwakit-0.7.12-0/k8 /apps/bcbio-nextgen/0.9.7/rhel6-x64/anaconda/share/bwakit-0.7.12-0/bwa-postalt.js -p /ngs/oncology/analysis/translation/TS_UK_0013_Nugen/bcbio_align/work/align/DNA38_S4_umi/hla/DNA38_S4_umi-sort.bam.hla /ngs/reference_data/genomes/Hsapiens/hg38/bwa/hg38.fa.alt | /apps/bcbio-nextgen/0.9.7/rhel6-x64/galaxy/../anaconda/bin/samtools sort -@ 8 -m 1G -T /ngs/oncology/analysis/translation/TS_UK_0013_Nugen/bcbio_align/work/align/DNA38_S4_umi/tx/tmp7QZZvY/DNA38_S4_umi-sort-sorttmp -o /ngs/oncology/analysis/translation/TS_UK_0013_Nugen/bcbio_align/work/align/DNA38_S4_umi/tx/tmp7QZZvY/DNA38_S4_umi-sort.bam /dev/stdin
Note that I also tried hg19 so this is not a problem caused by bwakit/hg38
I managed to rewrite the script using PySam (instead of calls to samtools via shell) so I'm all set for now but you may still want to look into this.
If you want I can make a pull request to deposit the pysam version in your repo.
The main issue is that you are allowing multi-mapped reads. The tool does not allow for reads that are secondary alignments. You will see in the read you posted above that the flag is 369, which is a secondary alignment. You can either change your alignment command to have -c 1, I believe for unique alignments only, or filter your BAM to remove any secondary alignments. I'm curious what you changed though to make it work?
Hi there, Thanks for releasing this code out there, much appreciated. The code works OK for single end reads but for paired end reads I get the following error right away:
Usually once it's been through a few thousand reads.
In the output the last read is this
In the input the read following the above is
I don't know if there is anything special about these two reads