scottgigante / picopore

A tool to reduce the size of Oxford Nanopore Technologies' datasets without losing information
GNU General Public License v3.0
31 stars 5 forks source link

Runtime ok? #3

Closed colindaven closed 7 years ago

colindaven commented 7 years ago

Hi,

I used picopore successfully to a) do lossless compression on 0.5TB input and b) further do deep-lossless compression on the results of a).

Now I am trying to get rid of all metrichor basecalls to do an albacore test. To do this I am running raw compression on the results of b).

Now this has taken 60 hours already on 8 cores, is this to be expected ? Or would you recommend me to start again from the completely untouched raw data ?

Thanks, Colin

scottgigante commented 7 years ago

Hi Colin,

One tradeoff of the deep-lossless compression mode is that the files are in a format that is no longer readable by other nanopore analysis tools, including other methods of picopore compression. This is because deep-lossless is designed only for the long-term storage of event data, and can only be re-analysed after using picopore --revert --mode deep-lossless.

I recommend either reverting the deep-lossless compression, or simply starting from scratch. I'll put in a catch in the next update to check that files are of the expected format before performing raw compression.

If running as expected, raw compression should be significantly faster than lossless and deep-lossless. Please let me know if you have any further issues.

Cheers, Scott

colindaven commented 7 years ago

Thanks, I thought something had gone drastically wrong.

Interestingly, it did "complete" after ~4 days.

Complete. Original size: 264230682772 Compressed size: 218833244816 rcug@hpc01:/lager2/rcug/2017/minion_public/2D$ [1]+ Done picopore --mode raw -t 8 -y * (wd: /working2/rcug/scratch/MinIOn/lambda_control/reads/downloads)

scottgigante commented 7 years ago

Hi Colin,

Do you mind sending me a sample of your files so I can check what's going on? I haven't seen raw compression values in that range, so I suspect something unusual is going on.

Cheers, Scott

colindaven commented 7 years ago

Sorry, these are someone else's data, I can't make available. I think I messed them up good and proper anyway, trying to revert from the lossless > deep-lossless -> raw data to the original data threw a weird error. But thanks for a nice package and saving us all some diskspace.

Colin

scottgigante commented 7 years ago

No worries Colin. I'll make sure to put a check to prevent that error in the next release.

Cheers, Scott