Closed leoisl closed 1 year ago
Looks to me like you found a bug!
It seems pretty minor though, no changes to the KMeans clustering in my tests! Maybe KMeans clustering runs a little bit faster since the array you give to it might be smaller, but that is about it.
Can make a pull request, if you wish.
Happy for a pull request now, or later (looks like you've got a few tests over on your fork, and you might find a few more bugs before you are done :) )
Hello again!
Very very sorry for the huge delay on answering you! Yeah, I am making a few additional tests so I can understand better the code, and also to make some bug hunting... But I think I might have found one or two minor bugs for now (i.e. does not seem to change the output). It is a nice script BTW, I've learned from it, thanks!
I shall try to fully understand the code in the next few weeks and make a PR later with what I've found, and if you also agree on the possible fixes. Not sure if I'll be able to do so though (last month of PhD, I hope you understand :p).
Hopefully you will be still around when I arrive, so that we can chat a little bit about the code!
Sounds good!
Closing, outdated
Hello there!
I am reading this code and trying to understand a little bit of it! I have a doubt on the kmer ID attribution, this code. I'd propose to check for the presence of the kmer instead of the full seq at the if.
In this test, these are the values of
self.kmer_dict
andseq_kmer_counts
before:and after a small fix:
There is not really difference regarding
self.kmer_dict
- each kmer still has an unique ID, but theseq_kmer_counts
, which is fed into theKMeans
algorithm might contains a lot of not meaningful 0s (whenever a kmer is repeated in the considered subalignment, we have an additional 0).I am wondering if this is really a (minor) bug or I am just misunderstanding...
Thanks!