jdidion / atropos

An NGS read trimming tool that is specific, sensitive, and speedy. (production)
Other
120 stars 15 forks source link

gzipped file created by atropos cannot be parsed by pysam. #40

Closed cokelaer closed 6 years ago

cokelaer commented 6 years ago

I have experience a problem with the gzip file output by atropos (version 1.0.23). The output fastq.gz file is correct, however, when parsed with the pysam library it looks like the gzip file is corrupted somehow and the iteration stops (without errore). I am not sure this is a pysam issue or an atropos issue (or both). Here is the code used to scan the fastq.gz file created by atropos in multithreaded mode.

>>> import pysam
>>> fastq = pysam.FastxFile(self.filename)
>>> for i, record in enumerate(fastq):
>>>     pass
>>> print(i)
985

but the input fastq file has a million reads. Then, I decompressed and recompressed the file and everything seems fine. I have posted this issue in atropos repository (not in pysam yet) to figure out whether others had experience this issue; I understand this may not be an atropos issue. I was using zlib 1.2.11 and pysam 0.11.2.2

cokelaer commented 6 years ago

FYI, using other reader such as the one from atropos, there is no issue in parsing the fastq.gz file so I believe this is a pysam related issue. Therefore, we can close this issue.

jdidion commented 6 years ago

It looks to me like pysam is using htslib's bgzf implementation under the hood, so it's unclear whether the issue is with pysam or htslib or atropos. I'm not ready to disregard the possibility that it's the later.

Can you provide an example file that exhibits the behavior you describe so I can dig into this further?

cokelaer commented 6 years ago

I've got a 80Mo example. I can send you the file. Do you have a preferred email address ? I won't send the file by email but via a ftp service.

jdidion commented 6 years ago

Thanks. Please send it to github@didion.net mailto:github@didion.net.

On Sep 21, 2017, at 2:58 PM, Thomas Cokelaer notifications@github.com wrote:

I've got a 80Mo example. I can send you the file. Do you have a preferred email address ? I won't send the file by email but via a ftp service.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jdidion/atropos/issues/40#issuecomment-331249857, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHrnnaDKCY26Gmq0JIXFP4AIcV-_0gUks5skrG5gaJpZM4PeE8t.

cokelaer commented 6 years ago

okay I will send the file example tomorrow

jdidion commented 6 years ago

Thanks!

On Sep 21, 2017, at 3:07 PM, Thomas Cokelaer notifications@github.com wrote:

okay I will send the file example tomorrow

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jdidion/atropos/issues/40#issuecomment-331252271, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHrnuNgq8NI9k2IJHETQHobVMo3djbXks5skrPugaJpZM4PeE8t.

cokelaer commented 6 years ago

I've sent a 75Mb file that fails to be parsed with the code above (using pysam). I've seen that if we unzip and zip the file, then it works fine.

jdidion commented 6 years ago

Great, thanks!

jdidion commented 6 years ago

I can't reproduce this on my system. When I run your example code on the file you sent me, it works fine. For the record, I'm running:

macOS 10.12.6 python 3.6.1 pysam 0.12.0.1

I'm going to close as 'works for me' but feel free to reopen if you still think this is an atropos issue.