tzcoolman / FACS-OLD

0 stars 2 forks source link

Support for .fastq.gz #11

Closed brainstorm closed 11 years ago

brainstorm commented 12 years ago

Feature:

Our pipeline (and many others) dump FastQ compressed as gzip (http://en.wikipedia.org/wiki/Gzip). Add support to query against those on DRASS, transparently:

./simple_check -m 1 -q tests/data/ecoli_dummy.fastq.gz -r tests/data/ecoli.bloom

brainstorm commented 12 years ago

I tried adding external support via FIFO pipes, as noted in:

http://seqanswers.com/forums/archive/index.php/t-16540.html

In commit:

https://github.com/brainstorm/DRASS/commit/18b989e4c6142c1d4217c99298eba93a4eaf1574

But apparently simple_check cannot allocate memory when it's a UNIX FIFO:

Checking contamination against gz fifo file... ./simple_check -m 1 -q tests/data/ecoli_dummy.fastq.gz.fifo -r tests/data/ecoli.bloom mmap source : Cannot allocate memory make: *\ [tests] Error 1

tzcoolman commented 12 years ago

U want to use FIFO to read the file without unzipping it??

brainstorm commented 12 years ago

Yes, but that was just a proof of concept. Gzip support should be preferably built in in DRASS, like other bioinfo programs do. And no, uncompressing the whole file first is not an option, reading/loading the file should be transparent. Den 29 aug 2012 10:12 skrev "tzcoolman" notifications@github.com:

U want to use FIFO to read the file without unzipping it??

— Reply to this email directly or view it on GitHubhttps://github.com/tzcoolman/DRASS/issues/11#issuecomment-8118169.

tzcoolman commented 12 years ago

The reason to use FIFO is that people want to automatically unzip it and then process it. Is that correct?

brainstorm commented 12 years ago

Yes, that is correct, unzip it as a stream, here's an example:

http://windrealm.org/tutorials/decompress-gzip-stream.php http://stackoverflow.com/questions/1838699/how-can-i-decompress-a-gzip-stream-with-zlib

brainstorm commented 11 years ago

That one is pretty interesting too, it hijacks "open()" to transparently decompress the files.

http://www.zlibc.linux.lu/download.html

arvestad commented 11 years ago

I have used this (or equivalent) several years ago. Just tested it actually, but it is a nice solution. Won't work with memory mapping though, since (IWIR) it uses a subprocess for the decompression.

L

On Oct 8, 2012, at 15:58 , Roman Valls wrote:

That one is pretty interesting too, it hijacks "open()" to transparently decompress the files.

http://www.zlibc.linux.lu/download.html

— Reply to this email directly or view it on GitHub.

brainstorm commented 11 years ago

Thanks Enze for implementing this, I've just added support for it in the wrapper and works (although still segfaults due to the '@' issue (that character present in the qualities in the beggining of the line)).

Another thought I had is that we could just merge that code (big_query.c) with the regular simple_check_1_ge.c. But it works nice as it is now.

Thanks again!