Closed jjdevega closed 1 year ago
Hello,
Unfortunately, it is not possible yet. However, this is on top of my todo and I have already started the implementation. I will keep you posted as soon as a testable version is available.
Note that not all matrices can be merged, only matrices using the same minimizer distribution function can. In the recent release v1.3.0, you can use a new parameter, --repart-from
, allowing to use the distribution function of an existing kmtricks run. So while waiting for the merge feature, I suggest to build the matrices using the same function to make them ready for merging.
Ex:
kmtricks pipeline --file matrix_1.txt --run-dir ./matrix_1
kmtricks pipeline --file matrix_2.txt --run-dir ./matrix_2 --repart-from ./matrix_1
kmtricks pipeline --file matrix_3.txt --run-dir ./matrix_3 --repart-from ./matrix_1
I hope this help.
Teo
Hello,
I still have to make some changes but you can already test the feature on the dev branch. Release and docker/conda packages should be available next week.
git clone --recursive https://github.com/tlemane/kmtricks.git
cd kmtricks
git checkout dev
./install.sh
kmtricks pipeline --run-dir ./matrices/mat1
kmtricks pipeline --run-dir ./matrices/mat2 --repart-from ./matrices/mat1
kmtricks pipeline --run-dir ./matrices/mat3 --repart-from ./matrices/mat1
kmtricks combine --fof fof.txt --output ./new_matrix
With fof.txt:
./matrices/mat1
./matrices/mat2
./matrices/mat3
Let me know if you encounter any issues.
Teo
Thanks for kmtricks; we have incorporated it into one of our lab pipelines with significant computing time improvement.
We use kmtricks to generate binary presence/absence matrices from x samples, each from 2-4 fastq files (.fq.gz). These files are significant, and a goal is to remove them from storage after computing.
Our usage is fairly simple: kmtricks pipeline --mode kmer:pa:bin kmtricks aggregate --pa-matrix kmer --format text
My query is, I want to incorporate z additional samples at a later date and recalculate everything, but without bringing back the reads for previous x samples, i.e. adding the new samples from fastq files into the previous run quants.
Is it possible? I have tried to get some ideas from the wiki, but I need help finding something suggesting this is possible and where to start
Thanks for your help.