sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
471 stars 80 forks source link

add more hash manipulation utilities to sourmash CLI #1266

Open ctb opened 3 years ago

ctb commented 3 years ago

yesterday, I spent some time digging into a sourmash use case with @shannonekj, and a few different reasonably generic utility script needs emerged.

the code for this is in a private repository so I'll try to describe things here - we were looking for differential presence of hashes in a genome between two samples (specifically, looking for hashes that correlated with male vs female genomes).

to do this, we needed the following new functionality -

I implemented all of this in a Jupyter notebook fairly easily, but it'd nice to have this in the sourmash CLI.

since code exists for all of this and I can make it available upon request, I'll label this as a good first issue...

ctb commented 3 years ago

I put several utilities in https://github.com/ctb/2020-emqc-scripts, which also includes the abundhist code from https://github.com/dib-lab/sourmash/pull/933.

These are pretty good candidates for a plugin :) #1353

ctb commented 2 years ago

some of these may now be possible with sig inflate added in https://github.com/sourmash-bio/sourmash/pull/1889