philres / ngmlr

NGMLR is a long-read mapper designed to align PacBio or Oxford Nanopore (standard and ultra-long) to a reference genome with a focus on reads that span structural variations
MIT License
293 stars 40 forks source link

Issue when converting a SAM file into bam file after mapping with ngmlr-0.2.7 #46

Closed ediezben closed 6 years ago

ediezben commented 6 years ago

Hi Phil,

I have been trying to convert a SAM generated by NGMLR in order to use the Sniffles software afterwards, but I keep getting an error from samtools whenever I try to convert to BAM format.

I generated the SAM file using the command: ngmlr -t 4 --bam-fix --rg-id test --rg-sm tb -r MTB-h37rv_asm19595v2-eg18.fa -q corrected.fastq -o test.sam

And then tried to convert using:

samtools view -bS test.sam

[samopen] SAM header is present: 1 sequences. Parse error at line 5: missing colon in auxiliary data

The First lines of the SAM looks like this:

@HD VN:1.0 SO:unsorted @SQ SN:Chromosome LN:4411532 @PG ID:ngmlr PN:nextgenmap-lr VN:0.2.7 CL:ngmlr -t 4 --bam-fix --rg-id test --rg-sm tb -r MTB-h37rv_asm19595v2-eg18.fa -q corrected.fastq -o test.sam @RG ID:test SM:tb m151107_175218_42220_c100812592550000001823179610291585_s1_X0103477550_17130/8157_8943 16 Chromosome 533951 60 5S227M1I289M1I153M3D110M * 0 0 CGTAGTAGGCCAGTTCGATGCACTGCCGCTGCGTGTCGGTCAACGCCTTGAGGCACTCGGTCACCCGGCGCCGCTCATCACCGGCGATCGCCAGGTCGGCGACGACGTCACTCGCGGGATCGACGTTGGCCGCACCATAGCGCACTTCCCGCTGGTTGCCGGCTTGCTCGCAACGGACTCGGTCGACAGCGCGCCGGTGGGCCATGGTCAAAAGCCAGGCCAACGCGGAACCTTTTGGCGGAGTCAAACTCCGACGCGTTCCGCCACACCTCAAGATAGATCTCCTGGGTGGTTTCTTCGCTGTAGCCGGTATCACGCAGCACCCGCATCACCAGTCCATACACCCGCGACTTGGTGTGGTCGTAGAATTCGGCGAATGCGGCCTGGTCGTGACCAGCGACCCGGCGCAACAGGGCGTCCAGGTCGCTGCTCAGCCGTGGCGGTCCGGTCATCGATGGGTAGCCTATCGCCAGCCGGCGCCGAGATGGTCAAGCCGGTCATCACCGACGCGCCGATCGCGGTGGGCCGGGGCACGAAATAGGCTGTTCGCCTTTGATATTCGGCGAAACCGGGGCGACCCTTCAGGTATCTCTCAGTCAGCCGGGCTCCGCTGACGTCCACCAGCAGGTAGGTCATCAGCAGCGGCGAACCCACCGTGGCCAGCGGCGCCCAGTCGATCGTGATCAACCACAACCCCCACCAGACACAGGCATCGCCGAAGTAGTTGGGGTGACGCGTCCAGGCCCACAGGCCGCGGTCCATGATGACCCCGCGATTGGCCGGGTC 999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 AS:i:1526 NM:i:6 XI:f:0.9923 XS:i:0 XE:i:1526 XR:i:781 MD:Z:476T192^TTG110 SV:i:2 QS:i:5 QE:i:786CV:f:99.363869

Thanks beforehand for any help.

Regards,

Ernest

fritzsedlazeck commented 6 years ago

Hi Ernest, this looks like it might be related to #43 . The RG tag is not incorporated when using multiple threads. The only workaround I can offer right now is to use single thread. @philres is looking into this bug.

Thanks Fritz

ediezben commented 6 years ago

Hi Fritz,

thanks for the quick reply. I have tried the single threaded option as you mentioned and it does print out the RG:Z: label in each line but I don't think that is the issue as it still throws the same error.

Regards,

Ernest

fritzsedlazeck commented 6 years ago

I see. The only thing I can make out from what you posted is the "QE:i:786CV:f:99.363869" Is that a copy paste thing or is there really no tab in between? Just for testing can you run it without the RG?

Thanks Fritz

ediezben commented 6 years ago

Hi Fritz,

it is the copy paste that is mssing the tabs, but the tabs are there in the file. Not sure what the issue might be, I have tried without the RG and having bam-fix on and off but still no luck:

ngmlr -r MTB-h37rv_asm19595v2-eg18.fa -q corrected.fastq.gz -x pacbio > test.sam

but the problem persists, would it be helpful if I sent you the files I used? the mapping takes no longer than couple of minutes.

Regards,

Ernest

philres commented 6 years ago

Hi Ernest,

thank you for reporting this. It would be very helpful if you could send us the files you used!

Thanks, Philipp

ediezben commented 6 years ago

Hi Philipp,

attached the files used .

Regards,

Ernest corrected.fastq.gz MTB-h37rv_asm19595v2-eg18.fa.gz

philres commented 6 years ago

That's perfect, thank you very much. I'll have a look at it today in the evening and will get back to you as soon as possible!

On Wed, 1 Aug 2018, 11:30 ediezben, notifications@github.com wrote:

Hi Philipp,

attached the files used .

Regards,

Ernest corrected.fastq.gz https://github.com/philres/ngmlr/files/2249052/corrected.fastq.gz MTB-h37rv_asm19595v2-eg18.fa.gz https://github.com/philres/ngmlr/files/2249053/MTB-h37rv_asm19595v2-eg18.fa.gz

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/philres/ngmlr/issues/46#issuecomment-409529545, or mute the thread https://github.com/notifications/unsubscribe-auth/ACN2HPBqQ26TBc0ZWDZ4emrHP-KvCf5Zks5uMYNCgaJpZM4Voj3w .

philres commented 6 years ago

Hi!

Could you please try running your data with https://github.com/philres/ngmlr/files/2254018/ngmlr-0.2.8-dev.tar.gz and check if the problem still exists?

Thanks, Philipp

ediezben commented 6 years ago

Hi Phillip,

unfortunately the problem persists, have you been able to replicate the error? I am wondering whether it has something to do with the samtools version, I am using samtools version 0.1.19.

Regards,

Ernest

philres commented 6 years ago

No unfortunately I haven't been able to reproduce it. I ran it a couple of times in a loop to make sure that it is not something that happens randomly sometimes.

Although I would usually not think that the samtools version should be an issue, 0.1.19 is indeed quite old. Would it be possible for you to upgrade and see whether this solves the problem?

In case you don't have root access I would recommend using https://bioconda.github.io/ to install packages. It is very convenient and doesn't require root.

Best, Philipp

ediezben commented 6 years ago

Hi Phillip,

updating samtools to version 1.9 fixed the issue. Sorry for the trouble.

Regards,

Ernest

philres commented 6 years ago

No worries, I'm happy that the problem is resolved!

Let us know in case you have any other feedback.

Best, Philipp

philres commented 6 years ago

Hi Ernest,

I just figured out what caused the problem with samtools 0.1.19 and fixed it. Starting form the next version ngmlr will be compatible with samtools <= 0.1.9 again.

Thanks again for reporting this, Philipp