sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
466 stars 79 forks source link

Ideas for JGI "tech talk" on sourmash (May 2020) #961

Open ctb opened 4 years ago

ctb commented 4 years ago

Random gemisch / brainstorming about an opportunity I have to give a more in depth tech talk on sourmash at JGI in early May 2020. What should I cover?

An incomplete list of potential tech-y topics --

  1. Our software development process for sourmash (see blog post) - incl code review, testing, rust optimization.
  2. scaled, gather, and metagenome analysis; benchmarking results/CAMI; comparison with other MinHash-based techniques
  3. taxonomy stuff in LCA module (classify, summarize, and free/arbitrary taxonomy); GTDB
  4. scaled and unknown hashes as "features" per @taylorreiter IBD stuff;
  5. contamination analysis in MAGs, maybe?
  6. Limitations! small genomes in particular. taxonomic generalization (but see spacegraphcats).
  7. Databases and indexing (SBT, LCA, and various optimizations thereto)

Might be good to discuss or point at some of the use cases (#208).

In previous not-as-tech-y talks I've done this kind of thing as a series of vignettes focused on a biological question (e.g. see "Building methods to explore the unknown in metagenomes"). I think that centers the biology in a way that I'm not sure I want to do here?

ctb commented 4 years ago

other random points to make - sourmash is "open source mature"

ctb commented 4 years ago

Philosophy of sourmash

Discuss GitHub and basics of oss

ctb commented 4 years ago

combine "move fast with new features" with "don't break old things", and "polish the new features as they mature"

ctb commented 4 years ago

maybe "pay attention to what advanced users (phil, tessa, harriet, taylor, etc.) are doing, help them do it, and then design that back in" - clustering, revindex, lca, etc.

ctb commented 4 years ago

databases - combining databases in search, gather, summarize is straightforward and additive.

ctb commented 4 years ago

side note to self - this talk is also supposed to cover spacegraphcats... :)

ctb commented 4 years ago

outline of possible topics updated here - working copy for now. https://docs.google.com/document/d/1zMj1KjqxDPhvAmajcNjEU0DgGw88c-XGEuXUU9Om2Rg/edit?usp=sharing