Closed nikostr closed 6 months ago
I posted this question before I understood the merge and aggregate command. In case someone else has the same issue, I solved it by doing the following:
kmtricks merge \
--recurrence-min $N_CASES \
--cpr \
--run-dir kmdiff-count \
--threads 16
kmtricks aggregate \
--run-dir kmdiff-count \
--matrix kmer \
--format text \
--cpr-in \
--output count-matrix.out \
--threads 16
The first command creates a matrix with kmers occurring in at least as many samples as I have cases (N_CASES
), and the second command dumps this as a text file. Following this I grepped count-matrix.out with the list of kmers I had identified previously.
Note: using this count matrix it should be possible to find these kmers without creating the membership matrix.
I have run kmdiff and identified overrepresented kmers among two groups. Following this, I created a membership matrix to identify kmers present in all my case samples, and intersected these with the overrepresented kmers identified by kmdiff. Now I am interested in getting the counts of these in each of my case samples. I already have the count matrices produced by kmdiff. Dumping these to text and grepping them is obviously one way of doing it, but clearly not very efficient. What would your recommendation be here? Unfortunately my C++ is terrible.