Open enricorox opened 1 year ago
Making things simple, here there is an easy example.
CCCTGACAAAAAGGGCCCCAAGCTTCCAATA
3
.TATTGGAAGCTTGGGGCCCTTTTTGTCAGGG
in the unitigs file: it's on unitig 0
2
I think it's because you don't reverse the unitig counts vector when you reverse-complement the unitig.
Hi! I'm a computer engineering student and I'm doing my master thesis on improving UST basically (see here if interested).
I wrote a simple C++ program that extracts canonical kmers from simplitigs and appends sequentially its counts using UST output files. Then I sorted the kmers list and compared to the one computed by Jellyfish-2.
There are difference between counts, though kmers are the same. Can you confirm this?
How to reproduce
Extract kmers and counts from ust output files:
g++ kmers-extractor.cpp -o kmers-extractor
./kmers-extractor <kmer-size> <ust-fasta> <ust-counts>
sort ust-kmers.txt -o ust-kmers-sorted.txt
Extract kmers and counts from starting sequence (not the bcalm one):
jellyfish-linux count -m <kmer-size> -C -s 100M -L 2 <starting-fasta>
jellyfish-linux dump -c mer_counts.jf > kmers.txt
sort kmers.txt -o kmers-sorted.txt
Compare the two files:
cmp kmers-sorted.txt ust-kmers-sorted.txt
kmers-extractor is attached.
Note that kmers with abundance 1 are ignored.