phylo42 / IPK

Inference of phylo-k-mers
MIT License
4 stars 1 forks source link

Option to save both the "full" and "merge-branches" database files #4

Open blinard-BIOINFO opened 2 years ago

blinard-BIOINFO commented 2 years ago

Currently, with "--merge-branches" only the database with only the highest probability branch per k-mer is output as .rps . Right now, two consecutive runs are necessary to get both the merged and unmerged versions, which involves uncessary recomputation of phylo-k-mers.

I suggest to use the following extension to differentiate them : .mps ("m"erged) .rps (current default behaviour used for placement in "r"appas2)

I need an xpas option to get either i) only the .mps, or ii) both the .mps + .rps in a single run.

Looking at the code, is seems that only small changes in step 2 (filtering) are needed :

blinard-BIOINFO commented 2 years ago

The mechanic that will be broken is that xpas::build() return a single _phylo_kmer_db. Here it will have to return two ... Note that for this particular application, teomporary duplication of _phylo_kmer_db in memory should not be an issue. The merged version will be much smaller than the full version (on top of that, for the amino acids application k is small, around 5~7).