Benchmarks on mock communities

sterrettJD commented 1 month ago

Could try pulling from the synthetic samples generated in the Pereira-Marques low biomass paper, which could be found here.

sterrettJD commented 1 month ago

Originally pulling with fasterq-dump, but that has a bug that causes issues with the beegfs system, according to biofrontiers IT. Documenting here for future reference

Good morning,

I wanted to reach out regarding a concern over the fasterqdump workload on fijinode-02 (see job IDs below). 9590776 short fasterq jost9358 R 4:27 1 fijinode-02 9590774 short fasterq jost9358 R 4:30 1 fijinode-02 9590775 short simulate jost9358 R 4:30 1 fijinode-02 9590772 short fasterq jost9358 R 4:33 1 fijinode-02 9590773 short fasterq jost9358 R 4:33 1 fijinode-02 9590771 short fasterq jost9358 R 4:36 1 fijinode-02

My understanding is that you were utilizing your scratch directory to run this which we normally would recommend. We have an ongoing issue due to a bug in fasterq-dump that can hang the server and it happens when the input or output uses a beegfs filesystem (which is /scratch). There are a few options moving forward: a. Submit a Slurm job and load your data into the /localscratch on that fijinode and use your data located there for processing b. Avoid using fasterq-dump and instead switch to using fastq-dump and this should resolve the issues.

Please let me know if you have any questions or concerns.

Best, Jes (they/them)

sterrettJD commented 1 month ago

Have converted to fastq-dump for this reason. Works successfully now

sterrettJD commented 3 weeks ago

Basic Pereira benchmarks implemented in #76

sterrettJD / HoMi

Benchmarks on mock communities #72