gui11aume / starcode

All pairs search and sequence clustering
GNU General Public License v3.0
90 stars 21 forks source link

--print-clusters bug with --sphere option #18

Closed nlubock closed 6 years ago

nlubock commented 6 years ago

Hello,

There appears to be a bug when using the --print-clusters flag with the --sphere clustering option. The resulting clusters are just repeats of the centroid. Here's a reproducible example from the latest master e125759

> cat test.tsv
GGGGGGGGGGGGGGGGGGGG    50
AGGGGGGGGGGGGGGGGGGG    10
AAGGGGGGGGGGGGGGGGGG    5
AAAGGGGGGGGGGGGGGGGG    20
TGGGGGGGGGGGGGGGGGGG    20
TTTTTTTTTTTTTTTTTTTT    100

> starcode -d1 --sphere --print-clusters -i test.tsv
running starcode with 1 thread
reading input files
raw format detected
sorting
progress: 100.00%
spheres clustering
TTTTTTTTTTTTTTTTTTTT    100 TTTTTTTTTTTTTTTTTTTT
GGGGGGGGGGGGGGGGGGGG    80  GGGGGGGGGGGGGGGGGGGG,GGGGGGGGGGGGGGGGGGGG,GGGGGGGGGGGGGGGGGGGG
AAAGGGGGGGGGGGGGGGGG    25  AAAGGGGGGGGGGGGGGGGG,AAAGGGGGGGGGGGGGGGGG

Hope this helps! Nate

ezorita commented 6 years ago

Indeed, there seems to be a bug in sequence transfer/canonical annotation. Thanks for reporting.

ezorita commented 6 years ago

This seems to solve the issue. Please pull the last commit to master.

Thanks!