lmrodriguezr / nonpareil

Estimate metagenomic coverage and sequence diversity
http://enve-omics.ce.gatech.edu/nonpareil/
Other
42 stars 11 forks source link

Error: "The file provided does not have the proper fastq format" or hanging when supplying gzipped files #66

Closed jfy133 closed 3 weeks ago

jfy133 commented 1 month ago

I wanted to test nonpareil with the new gzip functionality, however I'm encountering a variety of errors.

When using the bioconda recipe (which is the same as the previous, just with the new code and a zlib dependency), I get the following error on all files I test

$ nonpareil -s ERX5474932_ERR5766176_1.fastq.gz -T kmer -f fastq -b output
Nonpareil v3.5.1
 [      0.0]   The file ERX5474932_ERR5766176_1.fastq.gz.enve-tmp.158173 was created
 [      0.0]  reading ERX5474932_ERR5766176_1.fastq.gz.enve-tmp.158173
 [      0.0]  Picking 10000 random sequences
 [      0.0]  Started counting
Fatal error:
The file provided does not have the proper fastq format
 [      0.0] Fatal error: The file provided does not have the proper fastq format

So I went to try the compiled version you included on the release here on GitHub,

$ wget https://github.com/lmrodriguezr/nonpareil/releases/download/v3.5.1/nonpareil-3.5.1-Linux_x86_64
$ chmod +x nonpareil-3.5.1-Linux_x86_64

and while it works if I uncompress the file (uncompressd to 'test.fastq')

$ ./nonpareil-3.5.1-Linux_x86_64 -s test.fastq -T kmer -f fastq -b output
Nonpareil v3.5.1
 [      0.0]  reading test.fastq
 [      0.0]  Picking 10000 random sequences
 [      0.0]  Started counting
 [      0.1]  Read file with 632060 sequences
 [      0.1]  Average read length is 151.000000bp
 [      0.1]          Worker 0 @start_samples.
 [      0.1]  Sub-sampling library
 [      0.2]          Worker 0 @start_checkings.                      
 [      0.2]  Evaluating consistency                              
 [      0.2]  Everything seems correct
 [      0.2]          Worker 0 @exit.

It just hangs forever. on the following...

$ ./nonpareil-3.5.1-Linux_x86_64 -s ERX5474932_ERR5766176_1.fastq.gz -T kmer -f fastq -b output
Nonpareil v3.5.1

The two test files I tried this on I've placed on dropbox here, which are valid FASTQ files as I use them for a vairety of pipelines I use.

Note that in all cases empty tmp files are generated, e.g. with:

e.g. for the three most recent (Failed) tests:

-rw-rw-r-- 1 james james    0 Jun 24 10:12 ERX5474932_ERR5766176_1.fastq.gz.enve-tmp.160124
-rw-rw-r-- 1 james james    0 Jun 24 10:16 ERX5474932_ERR5766176_1.fastq.gz.enve-tmp.161707
-rw-rw-r-- 1 james james    0 Jun 24 10:17 ERX5474932_ERR5766176_1.fastq.gz.enve-tmp.162390
lmrodriguezr commented 3 weeks ago

Thank you for all the testing @jfy133 , and hopefully this is the last of the faulty releases. Apologies for that. Please feel free to reopen if the new version doesn't work for you (I'll be creating a release very soon).

jfy133 commented 3 weeks ago

That appears to work better now :D noted someething else (will make a separate issue), but it appears to run without erroring now. I'll look into updating bioconda

jfy133 commented 3 weeks ago

Actually nevermind, @martin-g beat me too it :D

https://github.com/bioconda/bioconda-recipes/pull/48783