sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
473 stars 80 forks source link

add 'cluster' and 'cocluster' to sourmash #700

Open ctb opened 5 years ago

ctb commented 5 years ago

the cocluster script may be useful for people comparing the output of binning.

see also #459

ctb commented 5 years ago
% python cocluster.py --first podar-ref/63.fa.sig --second podar-ref/63.fa.sig podar-ref/2.fa.sig  -k 31 --cut-point=1.0
first list contains 1 files; second list contains 2 files.
... loading file 0 of 1 for first list
... loading file 1 of 2 for second list
ksize: 31 / moltype: DNA
downsampling to scaled value of 1000
first list contains 1 signatures; second list contains 2 signatures.
...comparing 3 signatures, all by all

0-NC_011663.1 She...    [1. 1. 0.]
1-NC_011663.1 She...    [1. 1. 0.]
2-CP001071.1 Akke...    [0. 0. 1.]
min similarity in matrix: 0.000
** wrote coclust dendrogram to sourmash.coclust.dendro.pdf
cluster 2 is 1 in size
         CP001071.1 Akkermansia muciniphila ATCC BAA-835, complete genome
cluster 1 is 2 in size
         NC_011663.1 Shewanella baltica OS223, complete genome
         NC_011663.1 Shewanella baltica OS223, complete genome
** wrote coclust assignments spreadsheet to sourmash.coclust.csv
ctb commented 3 years ago

see also #1265, uniqify script, which I think is nice and simple.

ctb commented 3 years ago

may be good as a plugin test #1353