Closed Adoni5 closed 11 months ago
No, you have to separately decompress the file first. This has been requested before but I decided not to implement it for the following reason. The assembly runs on a large, expensive machine with a large number of CPUs and a lot of memory. It does not make economic sense to tie up that machine for a long time just to do a decompression, an essentially sequential step that can instead run on a much less expensive machine.
You could use shasta/scripts/FastqGzToFasta.py
to decompress the fastq.gz
file and convert to fasta
in one step to disk. The smaller size of the fasta
file compared to the uncompressed fastq
means that the decompression process has to do less I/O and so runs faster. In addition, the smaller size of the fasta
file also means that shasta
will be able to read it faster. Finally, less disk space is required. The size of the uncompressed fasta
is usually comparable to the size of the compressed fastq.gz
.
Interesting, I do see your logic there! I'm assuming that if I can input FASTA, Shasta doesn't factor the FASTQ qualities into the assembly?
In which case I will definitely do that to save space. Thanks!
Even if you use a fastq
file as input, Shasta does not use base qualities in the assembly. So the presence of the base qualities makes no difference in the assembly results.
Brilliant, thanks very much.
Hi @paoloshasta - thanks for the great work improving Shasta.
I was wondering if there is currently a way to pass compressed input (
.gz
or.xz
in this case) into Shasta? I've tried directly assembling compressed FASTQ and I cannae get it to work.Thanks, Rory