sourmash-bio / sourmash_plugin_branchwater

fast, multithreaded sourmash operations: search, compare, and gather.
GNU Affero General Public License v3.0
14 stars 2 forks source link

How to do `manysketch` on `dayhoff`, `hp` moltypes? #337

Open olgabot opened 1 month ago

olgabot commented 1 month ago

Hello! Hope you are doing well. I would like to try out manysketch for different protein moltypes. I see that the new gather, search, index and check commands all have the option to provide a moltype, but manysketch does not have that option.

For example, if I try putting hp into the --param-string argument, I get the below error:

(sourmash-branchwater)
 Sun 19 May - 12:28  ~/botryllus-data/data/uniprot/2024-05-19 
 @olgabot  sourmash scripts manysketch --debug --singleton --param-string hp,scaled=1,k=24,abund -c 8 --output 2024-05-19__uniprot_sprot.hp.k24.sig.zip manysketch.csv

== This is sourmash version 4.8.8. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

=> sourmash_plugin_branchwater 0.9.3; cite Irber et al., doi: 10.1101/2022.11.02.514947

params: ['hp,scaled=1,k=24,abund']
sketching all files in 'manysketch.csv' using 8 threads
Loaded 6 rows in total (0 genome and 6 protein files)
Error parsing params string: unknown component 'hp' in params string
Error: Failed to parse params string

Is there a way to make dayhoff and hp sketches with manysketch? Thank you so much!

ctb commented 1 month ago

The underlying ability must be there in the Rust code, so it's hopefully just a matter of connecting the dots. But that probably won't be me, at least not this week or next.

bluegenes commented 1 month ago

I can get to this in June when I'm back from my trip :)

ctb commented 3 weeks ago

note: https://github.com/sourmash-bio/sourmash_plugin_directsketch/pull/55