microbiomedata / metaMAGs

Workflow for metagenome assembled genomes generation.
5 stars 4 forks source link

Speedup packing #45

Closed scanon closed 1 month ago

scanon commented 1 month ago

These changes are actually from me.

The packaging step was very inefficient especially for metagenomes that have a lot of MAGs. This was because the process was serialized and each one has to read through the input files over and over. This changes it so each file is open and read once and the output is multiplexed to the various outputs files. It also parallelizes the tar file generation.

For one test case the previous way was taking 6-7 hours and it nows runs in a few minutes.

scanon commented 1 month ago

I think all the comments have been addressed.

chienchi commented 1 month ago

I think we need to update the version to 0.6.0 in Dockerfile_vis and rebuild it. In addition, the version of the docker string in the main WDL file, mbin_nmdc.wdl line 25 for this merge.

scanon commented 1 month ago

Let's merge this, then we can do as @chienchi suggested to rebuild the image and then update the WDL.