Open blinard-BIOINFO opened 2 years ago
The mechanic that will be broken is that xpas::build() return a single _phylo_kmer_db. Here it will have to return two ... Note that for this particular application, teomporary duplication of _phylo_kmer_db in memory should not be an issue. The merged version will be much smaller than the full version (on top of that, for the amino acids application k is small, around 5~7).
Currently, with "--merge-branches" only the database with only the highest probability branch per k-mer is output as .rps . Right now, two consecutive runs are necessary to get both the merged and unmerged versions, which involves uncessary recomputation of phylo-k-mers.
I suggest to use the following extension to differentiate them : .mps ("m"erged) .rps (current default behaviour used for placement in "r"appas2)
I need an xpas option to get either i) only the .mps, or ii) both the .mps + .rps in a single run.
Looking at the code, is seems that only small changes in step 2 (filtering) are needed :