Closed ColineGardou closed 2 years ago
Hello,
Thanks for trying kmtricks. Just to clarify, kmtricks can be used for two things, 1) Build a membership index by building Bloom filters (Supplementary tables relates to this feature), 2) Build a k-mer count matrix. Since you use DEkupl, I assume you need a count matrix, right ?
I have not noticed any problems in your commands. The difference could be explained by the k-mer filtering (--count-abundance-min
in kmtricks and --lower-count
in Jellyfish). DEkupl joincount uses also -r
(--recurrence-min
in kmtricks) and -a
(no direct equivalent but can probably be simulated by --merge-abundance-min X --save-if 0
, I will check).
Also please note that Jellyfish and kmtricks produce equivalent but not identical outputs because of canonical k-mers. For optimization reasons, kmtricks considers A < C < T < G instead of A < C < G < T.
A new version of kmtricks is coming soon, probably next week if I can finish the documentation. It is faster and more efficient, and includes new features, utilities and API, especially for dealing with kmtricks's files. I you want it before the release, just send me an email.
I hope this help.
Téo
Dear authors, we have indexed 94 RNA-seq files (total fastq.gz: 201Gb) and we obtained a 441Gb kmtrick index. This looks big compared to your supplementary tables. A merged Jellyfish index made with DEkupl's joincount for the same dataset was only 23Gb. We are wondering whether we are doing something wrong. I'm attaching my code below. Thanks ! kmtricks.txt whole_matrix.txt fof_mondor.txt