metagentools / MetaCoAG

🚦🧬 Binning Metagenomic Contigs via Composition, Coverage and Assembly Graphs
https://metacoag.readthedocs.io/en/stable/
GNU General Public License v3.0
57 stars 5 forks source link

Speed up writing final bins #4

Closed chassenr closed 2 years ago

chassenr commented 2 years ago

Hi @Vini2 ,

thanks for this really great binning tool. It performs very well on my data and it is much faster than other tools (that perform worse). I have a suggestion for the last step in the workflow. I noticed that the awk command to write the fasta files for each bin was taking almost as long as the binning process before... To speed things up, I implemented this step outside of MetaCoAG using bbmap filterbyname and GNU parallel:

cd output_dir/bins
ls -1 *.txt | sed 's/\.txt$//' | parallel -j60 'filterbyname.sh in=path/to/contigs.fasta out={}.fasta names={}.txt include=t'

Maybe this (or something similar) is also something you may consider for integration?

Thanks!

Cheers, Christiane

Vini2 commented 2 years ago

Hello @chassenr,

Thank you very much for your comments. This will be a great way to speed up the final step of writing sequences to fasta files of bins. I will definitely integrate this into the code.

Thank you very much for your suggestion.

Best regards, Vijini

Vini2 commented 2 years ago

Hello @chassenr,

I have added a fix to speed up the step of writing the final bins. If you have time, please get a pull from the repo and give it a try. Let me know how it goes.

Thank you!

Best regards, Vijini

chassenr commented 2 years ago

@Vini2 Thanks a lot for implementing the speed-up. Works perfectly :)

Vini2 commented 2 years ago

Closing issue after fixing.