linnarsson-lab / loompy

Python implementation of the Loom file format - http://loompy.org
BSD 2-Clause "Simplified" License
140 stars 37 forks source link

refactor `create_from_fastq` / `fromfq` to leverage kallisto max memory options #128

Open jfx319 opened 4 years ago

jfx319 commented 4 years ago

perhaps a refactoring of the fromfq and create_from_fastq is overdue given the updates to kallisto, bustools, and kb wrapper.

As the kb ecosystem gets better, it will probably be easier to just leverage their maintained --workflow since their goals are aligned with improving performance. However, the additional perspective from the loompy parsing demonstrates other useful metadata that could be parsed out, and demonstrates how to store it into the loompy object, which the kb tutorials don't always pay as close attention to.

jfx319 commented 4 years ago

loompy fromfq wrapper versus kb count wrapper (on the same dataset, same ref index files, same hardware, same kallisto & bustools binary versions). It didn't seem like kallisto was able to leverage the extra cpu's, so for the kb count run, I reduced it to 8 threads, and since kallisto doesn't use that much memory, I also added a max mem parameter of 20g (was run on the same node though).

loompy fromfq --threads 16 CPU usage 380GB 16core node kb count -t 8 -m 20G CPU usage 380GB 16core node

loompy fromfq --threads 16 RAM usage 380GB 16core node kb count -t 8 -m 20G RAM usage 380GB 16core node