Closed mmiladi closed 4 years ago
Hi @mmiladi, Just to confirm you are talking of the NanopolishComp Eventalign_Collapse output, aren't you? Not the nanopolish eventalign raw output (which by the way is much bigger).
Yes. For the nanopolish eventalign output, it's not a problem because one can pipe it to gzip and then zcat it to NanopolishComp, or directly pipe the two tools.
Yes that was my thinking when I developed the tool. So the reason why we use NanopolishComp Eventalign_Collapse output uncompressed is to have random access to the raw data when running NanoCompore, thanks to the index file. It is also possible to do that with gzip but it is extremely inefficient in terms of IO.
I guess bzip2 format might be good compromise but I am sorry to say that this is not on our immediate priority list at the moment.
DictZip might be an option as well.
We would gladly accept a PR to both NanopolishComp and Nanocompore if you want to have a go at it :D
Thanks for the info. And also thanks for the nice and well-documented work! Unfortunately, I don't now much about the underlying algorithm, so I am trying to contribute in other aspects :) : https://github.com/bioconda/bioconda-recipes/pull/21747
Thanks
Hi,
Requiring the plain text .tsv files as input for sampcomp require a lot of storage that becomes a limiting factor in the analysis capacity. It would be nice if you could support the tsv.gz as input.
Best,