Closed gorliver closed 3 years ago
Hi, this is not good :(
Which version of KMC are you using, is it one of the releases or do you compile the recent source code? I have tried with recent source code and the result is:
bin/kmc -k35 -t20 -m32 -ci1 -b -fa test.fa t_k35 .
**
Stage 1: 100%
Stage 2: 100%
1st stage: 1.34564s
2nd stage: 0.161782s
Total : 1.50742s
Tmp size : 0MB
Stats:
No. of k-mers below min. threshold : 0
No. of k-mers above max. threshold : 0
No. of unique k-mers : 2
No. of unique counted k-mers : 2
Total no. of k-mers : 2
Total no. of reads : 1
Total no. of super-k-mers : 1
AACACATATGAATCATCAAATTAACAACCAATATT 1
GAACACATATGAATCATCAAATTAACAACCAATAT 1
so it seems to be correct.
One more question: what operating system are you using? And yet another question: do you have end of line character ('\n') after the sequence? There was a bug related to files without this character at the end of the file which was, as it seems, partially in the last release. But it seems it was not fixed totally, i.e. the missing EOL is detected but the results are wrong. It seems that the current source code works fine, do you have the possibility to just compile KMC? I will create a new release soon I think due to last extensions of the code, but I am not sure when.
I use the precompiled release, the version is:
K-Mer Counter (KMC) ver. 3.1.1 (2019-05-19)
I run KMC on a HPCC so the precompiled one is the most convenient one for me. The system is centos.
There is no '\n' after the sequence. I added the '\n' and the result is the same. When I add another sequence, KMC generate the corrected kmers for the first sequence, but the second sequence is skipped by KMC:
**
Stage 1: 100%
Stage 2: 100%
1st stage: 0.304549s
2nd stage: 0.081882s
Total : 0.386431s
Tmp size : 0MB
Stats:
No. of k-mers below min. threshold : 0
No. of k-mers above max. threshold : 0
No. of unique k-mers : 2
No. of unique counted k-mers : 2
Total no. of k-mers : 2
Total no. of reads : 1
Total no. of super-k-mers : 1
The fasta file is:
>t
GAACACATATGAATCATCAAATTAACAACCAATATT
>t2
TTCCTCCATTATTTTATGGAACATGGGTAACCTCTA
The kmer I got is (from the dump command):
AACACATATGAATCATCAAATTAACAACCAATATT 1
GAACACATATGAATCATCAAATTAACAACCAATAT 1
I also tried a fasta file contain three 36bp sequences and KMC successfully generated correct kmers for the first two reads but skipped the third reads.
I will work around to compile the latest KMC on the HPCC, but a precompiled release is highly appreciated.
I managed to compile the latest version and all the issues are gone and it works great now. Many thanks for your help!
Hi, I have a fasta file containing just one reads:
I run
and here is the stdout:
The output of dump is:
I got many kmers that a not exist in reads.
How can I get rid of the unrelated kmers?
Thank you, Gorliver