Closed donovan-h-parks closed 2 years ago
Hi @dparks1134
Not directly, unfortunately. One could use a ramfs
to speed up the database loading (#92), something like:
mkdir ~/memorymap
sudo mount -t ramfs none ~/memorymap
cp ganon_database.* ~/memorymap/
ganon classify --db-prefix ~/memorymap/ganon_database ...
this should speed up loading times and it works with the current version ganon (v1.0.0).
I already notice this problem from other users and think it would be nice to have it integrated, marking as enhancement. However this may be non-trivial to implement with the use of hierarchical databases.
Thanks - I'll give it a try. Feel free to close this unless it is helpful to keep it as an open enhancement.
Hi @pirovc I gave this a try and got Error code: -9 using v1.1.1
conda activate python3.7_environment sudo mkdir /memorymap sudo mount -t ramfs none /memorymap sudo cp /media/ubuntu/Elements/reference_genomes/ganon/ARC_refseq_ALL_db/ARC_refseq_ALL_db. /memorymap/ & sudo cp /media/ubuntu/Elements/reference_genomes/ganon/BAC_refseq_ALL_db/BAC_refseq_ALL_db. /memorymap/ & sudo cp /media/ubuntu/Elements/reference_genomes/ganon/EUK_refseq_CG_db/EUK_refseq_CG_db. /memorymap/ & sudo cp /media/ubuntu/Elements/reference_genomes/ganon/VIRAL_refseq_ALL_db/VIRAL_refseq_ALL_db. /memorymap/ & for i in *_1_val_1.fq.gz; do b=${i%%_1_val_1.fq.gz} ganon classify -d /memorymap/ARC_refseq_ALL_db \ /memorymap/BAC_refseq_ALL_db \ /memorymap/EUK_refseq_CG_db \ /memorymap/VIRAL_refseq_ALL_db \ -p "$b"_1_val_1.fq.gz "$b"_2_val_2.fq.gz \ -o "$b"_ganon_results --output-lca --output-unclassified -t 28 done &
Hi @rjsorr. Does the same command work with the database files in a "normal" disk without using ramfs
? I tested the ganon classify multiple databases in the ramfs
and it works just fine for me.
this works, if that is what you mean?
cd /media/ubuntu/Elements/NEWPIPELINE_MetaAIR/RAW_DATA/neg_pos/TG_out conda activate python3.7_environment for i in *_1_val_1.fq.gz; do b=${i%%_1_val_1.fq.gz} ganon classify -d /media/ubuntu/Elements/reference_genomes/ganon/ARC_refseq_ALL_db/ARC_refseq_ALL_db \ /media/ubuntu/Elements/reference_genomes/ganon/BAC_refseq_ALL_db/BAC_refseq_ALL_db \ /media/ubuntu/Elements/reference_genomes/ganon/EUK_refseq_CG_db/EUK_refseq_CG_db \ /media/ubuntu/Elements/reference_genomes/ganon/VIRAL_refseq_ALL_db/VIRAL_refseq_ALL_db \ -p "$b"_1_val_1.fq.gz "$b"_2_val_2.fq.gz \ -o "$b"_ganon_results --output-lca -t 28 --verbose > "$b"_ganon_classify.log 2>&1 done &
Unfortunately this should be a problem in your side. I re-created the scenario here with several databases in the ramfs
and the same parameters (with version 1.1.1) and it just works. The sequential execution of ganon commands with the same set of databases is automatically faster in modern system due to caching, but indeed it takes some time. Anyways this is supposed to be a workaround, soon there will be an integrated batch execution function for ganon classify, but I cannot guarantee when this is going to be available.
Hi.
Is there a way to keep the Ganon DB in memory between running the
classify
method on different samples? At least for my use case, the majority of time is spent loading the DB into memory. I appreciate I could combine all my samples into a single file, but this makes for a rather awkward workflow and a lot of extra post-processing of results.Thanks.