pbmm2 using too much memory

google / deepconsensus

DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.

BSD 3-Clause "New" or "Revised" License

222 stars 37 forks source link

pbmm2 using too much memory #3

Closed zhoudreames closed 3 years ago

zhoudreames commented 3 years ago

my machine is down when i using pbmm2 aligning,my machine memory is 500G,and the hifi data 130G.how to solve it ?

armintoepfer commented 3 years ago

You can't use a full chip worth of Hifi reads as references. DeepConsensus is in the proof of concept stage, not meant for production yet. You can try this new tool, but there's no official support for it yet: https://github.com/PacificBiosciences/align-clr-to-ccs

AndrewCarroll commented 3 years ago

Hi @zhoudreames

As @armintoepfer indicates, DeepConsensus isn't yet scalable to run a full SMRT cell on an external machine. The v0.1 release still uses many systems that are efficient within Google's distributed systems, but are very inefficient when run on a single machine. We plan to improve this in future releases, but for now, it will be too slow for a full SMRT cell. If there is something you can use a targeted part of a sequence run, that might be better.

For the error you are encountering, I assume that this is in the pre-processing step? Thank you for pointing this out. This is one of the areas we will improve in the next release (potentially by using the tool that Armin linked). For now, to map the reads, you would need to use fewer reads for your input.

Thank you, Andrew

zhoudreames commented 3 years ago

You can't use a full chip worth of Hifi reads as references. DeepConsensus is in the proof of concept stage, not meant for production yet. You can try this new tool, but there's no official support for it yet: https://github.com/PacificBiosciences/align-clr-to-ccs

thanks for your help

zhoudreames commented 3 years ago

Hi @zhoudreames

As @armintoepfer indicates, DeepConsensus isn't yet scalable to run a full SMRT cell on an external machine. The v0.1 release still uses many systems that are efficient within Google's distributed systems, but are very inefficient when run on a single machine. We plan to improve this in future releases, but for now, it will be too slow for a full SMRT cell. If there is something you can use a targeted part of a sequence run, that might be better.

For the error you are encountering, I assume that this is in the pre-processing step? Thank you for pointing this out. This is one of the areas we will improve in the next release (potentially by using the tool that Armin linked). For now, to map the reads, you would need to use fewer reads for your input.

Thank you, Andrew when will the next version releases ? How much do I need to split reads for the 500G menmory ？ thanks~

AndrewCarroll commented 3 years ago

Hi @zhoudreames

I cannot give you an estimate for when the next version will be ready. With a fair amount of certainty, I will say likely more than 1 month from now and less than 6 months.

The changes in the next release will probably change the memory use, and I cannot now give you an estimate of memory at that time.