Closed hannesbecher closed 3 years ago
Hi Hannes,
Thank you for using RESPECT and for your feedback. I added the functionality to support gzipped FASTQ or FASTA files. Please try it and let me know if there was any issue.
Bests, Shahab.
Hi Shahab,
Thanks for implementing this! I just tried it and I get an error. It looks like you might be calling my installed version of gzip
, which does not take the same parameters as on your system. I'm using a server with scientific linux 7.8. Perhaps using the GZip python library would be possible as an alternative?
Error:
(RESPECT) bash-4.2$ time respect -i E001fw.fq.gz -o . --threads 20
2021-02-16 14:29:54,778 INFO:Processing E001fw.fq.gz...
gzip: invalid option -- 'k'
Try `gzip --help' for more information.
2021-02-16 14:30:26,974 INFO:compute_kmer_histogram finished in 0.3333916664123535 seconds
2021-02-16 14:30:26,974 ERROR:Error occurred when processing /disk2/hbecher_tmp/RESPECTanalyses/E001fw.fq.gz; it's skipped
Traceback (most recent call last):
File "/localdisk/home/hbecher/miniconda2/envs/RESPECT/lib/python3.8/site-packages/respect-0.0.1-py3.8.egg/respect/respect_functions.py", line 245, in run_respect
parameter_estimator.set_kmer_histogram(args.threads)
File "/localdisk/home/hbecher/miniconda2/envs/RESPECT/lib/python3.8/site-packages/respect-0.0.1-py3.8.egg/respect/paramter_estimator.py", line 212, in set_kmer_histogram
self.compute_kmer_histogram(n_threads)
File "/localdisk/home/hbecher/miniconda2/envs/RESPECT/lib/python3.8/site-packages/respect-0.0.1-py3.8.egg/respect/timer.py", line 68, in wrapper_timer
return func(*args, **kwargs)
File "/localdisk/home/hbecher/miniconda2/envs/RESPECT/lib/python3.8/site-packages/respect-0.0.1-py3.8.egg/respect/paramter_estimator.py", line 171, in compute_kmer_histogram
profiler_output = kmer_profiler(self.input_file, self.sequence_type, self.output_name, self.tmp_dir,
File "/localdisk/home/hbecher/miniconda2/envs/RESPECT/lib/python3.8/site-packages/respect-0.0.1-py3.8.egg/respect/profiling.py", line 91, in kmer_profiler
os.remove(input_file.rsplit('.gz', 1)[0])
FileNotFoundError: [Errno 2] No such file or directory: '/disk2/hbecher_tmp/RESPECTanalyses/E001fw.fq'
ValueError: Number of processes must be at least 1
Thanks v much, Hannes
A quick fix for me would be in profiling.py
line 78, to set cmd
to something that does zcat [input file.gz] > [input file]
.
But I don't know if zcat
is present on all systems.
So using a python library might still be better.
Hi Hannes,
There is a new release that you can use --decomp
option to specify a python library (zlib
or gzip
) for decompression instead of using built-in gzip
. Both libraries seem to be standard python libraries but I prefer zlib implementation because it reads the input in chunks and does not load the entire file into memory, something not possible when using gzip library. Still, I can imagine that command-line gzip
should be the most efficient option. It seems that you need version 1.6 or later to use it with -k
option.
Hi Shahab,
Thanks, --decomp zlib
works for me! I was not aware of the zlib
library, good point about the memory.
This is done as far as I am concerned.
Cheers, Hannes
Hi,
thanks for making available this tool! Would it be possible to make it run on gz-compressed files? Input via a sub shell à la
respect -i <(zcat reads.fq.gz)
Does not seem to work.Many thanks,
Hannes