medvedevgroup / UST

GNU General Public License v3.0
25 stars 3 forks source link

Counts seems incorrect #2

Open enricorox opened 1 year ago

enricorox commented 1 year ago

Hi! I'm a computer engineering student and I'm doing my master thesis on improving UST basically (see here if interested).

I wrote a simple C++ program that extracts canonical kmers from simplitigs and appends sequentially its counts using UST output files. Then I sorted the kmers list and compared to the one computed by Jellyfish-2.

There are difference between counts, though kmers are the same. Can you confirm this?

How to reproduce

Extract kmers and counts from ust output files:

Extract kmers and counts from starting sequence (not the bcalm one):

Compare the two files:

kmers-extractor is attached.

Note that kmers with abundance 1 are ignored.

enricorox commented 1 year ago

Making things simple, here there is an easy example.

I think it's because you don't reverse the unitig counts vector when you reverse-complement the unitig.