Closed Youpu-Chen closed 2 years ago
Hi,
what version of jellyfish are you using, becasue I cannot reproduce your results. This is what I get for jellyfish
jellyfish count -m 15 -o jf.Ecoli.15mer -c 3 -s 16G -t 16 Ecoli.fasta
jellyfish histo jf.Ecoli.15mer > jf.Ecoli.15mer.histo
head jf.Ecoli.15mer.histo
1 4443345
2 52456
3 11610
4 2700
5 5668
6 223
7 136
8 760
9 486
10 32
jellyfish --version
jellyfish 2.3.0
Installed with conda install kmer-jellyfish
Edit: also a really nicely reported issue, easy to repeat your steps, thanks!
The jellyfish I am using is 2.3.0 too. I used jellyfish-2.3.0.tar.gz to install the software. And the KMC version is 3.2.1.
Hmm, OK now I see I used -s 16G
for Jellyfish (anyway it runs much faster for example if you use -s 100M
). Anyway for -s 1G
I also got:
(base) mkokot@KOKI-PC-RYZEN:/mnt/c/nauka/DEVELOPMENT/kmc/issue-190$ jellyfish count -m 15 -o jf.Ecoli.15mer -c 3 -s 1G -t 16 Ecoli.fasta
(base) mkokot@KOKI-PC-RYZEN:/mnt/c/nauka/DEVELOPMENT/kmc/issue-190$ jellyfish histo jf.Ecoli.15mer > jf.Ecoli.15mer.histo
(base) mkokot@KOKI-PC-RYZEN:/mnt/c/nauka/DEVELOPMENT/kmc/issue-190$ head jf.Ecoli.15mer.histo
0 2
1 4443336
2 52451
3 11609
4 2699
5 5669
6 223
7 136
8 761
9 485
It seems to be a jellyfish bug, as for 1G the results are consistent with KMC. Furthermore, I have a trivial implementation of k-mer counter, available in KMC repo (tests/kmc_CLI/trivial-k-mer-counter). I counted k-mers with it and get the histo with simple c++ program:
#include <fstream>
#include <iostream>
#include <vector>
#include <string>
using namespace std;
int main(int argc, char**argv) {
ifstream in(argv[1]);
if(!in) {
std::cerr << "Error: cannot open input file\n";
return 1;
}
vector<uint64_t> histo;
std::string kmer;
uint64_t c;
while(in >> kmer >> c) {
if (histo.size() <= c)
histo.resize(c+1);
++histo[c];
}
for(size_t i = 0 ; i < histo.size() ; ++i) {
std::cout << i << "\t" << histo[i] << "\n";
}
}
make
bin/counter -k 15 -ci 0 -b ../../../../Ecoli.fasta x.txt
g++ -O3 histo.cpp -o histo
./histo x.txt > histo.txt
head histo.txt
The result again is
0 0
1 4443345
2 52456
3 11610
4 2700
5 5668
6 223
7 136
8 760
9 486
Thanks, it solve my problems!
Hi. there. I am running some tests using the genome of E. coli. The sequence used in the analysis was downloaded from NCBI using command:
The KMC command I used is below:
The Jellyfish command I used is below as well:
It seems to me that the meaning of the command above are the same, but I got different results. For KMC, the histos is: 1 4443345 2 52456 3 11610 4 2700 5 5668 6 223 7 136 8 760 9 486 10 32
But for Jellyfish, the histo is: 1 4443336 2 52451 3 11609 4 2699 5 5669 6 223 7 136 8 761 9 485 10 32
I just get more and more confused by the test. I'll really appreciate if you could give me a hint. Thanks.