muellan / metacache

memory efficient, fast & precise taxnomomic classification system for metagenomic read mapping
GNU General Public License v3.0
57 stars 12 forks source link

Provide memory mapping option? #43

Open ChillarAnand opened 1 month ago

ChillarAnand commented 1 month ago

Several classification tools provide "memory-mapping" option.

When running a huge number of samples, instead of loading db into memory everytime, "memory-mapping" option will allow to preload the db into ram once and run classification across all the samples which improves run time by a huge margin.

muellan commented 1 month ago

There's the "interactive query mode". If you run metacache query <database_name> without any read input files the database will be loaded into memory and you can then run as many queries as you like. This is easiest done by piping query strings into metacache in a script like in the example below:

#!/bin/bash
database="mydatabasename"
queries=""
# add query
queries="${queries} myreads.fq -out myoutfile.txt\n"
# add query
queries="${queries} reads1.fa reads2.fa -pairfiles -out myoutfile.txt\n"
# ... add more queries ....
# finally: load database and run all queries
echo -e ${queries} | ./metacache query ${database}