Closed bluegenes closed 2 months ago
Thank you @bluegenes for this detailed submission! I am adding initial checks here:
import package-name
.README.md
file with clear explanation of what the package does, instructions on how to install it, and a link to development instructions.CONTRIBUTING.md
file that details how to install and contribute to the package.Code of Conduct
file.YAML
header of the issue (located at the top of the issue template).@bluegenes @ctb @luizirber when you have a chance can all three of you please fill out the pre-survey review linked above?
In the meantime I will proceed with finding an editor 🙂
Sorry for the delay! I am signed on a guest editor and will proceed with finding reviewers. Looking forward to checking out your package.
I have contacted some reviewers! Hope to have reviewers onboard soon.
Awesome, thank you for the updates @snacktavish, we really appreciate you acting as editor for this one. Ok I'll let you take over now :zipper_mouth_face:
Waiting on responses from reviewers - got some No's, but also some recommendations of possible reviewers. Asking more folks and hoping for some yes's!
Emailing some more potential reviewers today! @bluegenes @ctb @luizirber do you have any reviewer suggestions?
@snacktavish would it be helpful for us to post about this specific review on social? And then we also have a connection with the single cell community if that might be helpful. Finally, we have a reviewer spreadsheet that might be helpful - people sign up to review for us. if you join us in our pyOpenSci slack we can chat and provide some additional resources that may help in this search for reviewers. you can email me at leah at pyopensci.org ✨
Great! We have @LilyAnderssonLee signed on as a reviewer, and she will get started next week. I also have a few more emails out to other folks, hopefully will have a second on-boarded shortly as well!
:wave: Hi @LilyAnderssonLee and @elais! Thank you for volunteering to review for pyOpenSci!
Before beginning your review, please fill out our pre-review survey. This helps us improve all aspects of our review and better understand our community. No personal data will be shared from this survey - it will only be used in an aggregated format by our Executive Director to improve our processes and programs.
The following resources will help you complete your review:
Please get in touch with any questions or concerns! Your review is due: Friday, December 8
Hi @LilyAnderssonLee and @elais - we would appreciate getting this review at the end of next week! Thank you,
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
pyproject.toml
file or elsewhere.Readme file requirements The package meets the readme requirements below:
The README should include, from top to bottom:
NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing:
@bluegenes Outstanding work with sourmash
! Your commitment to creating a package that's both easily maintainable and well-documented truly shines through. The code is impressively organized, accompanied by clear comments explaining each section, making it easy to comprehend the purpose of each file and function. Furthermore, the documentation for high-level files and directories
is exceptionally clear — I intend to adopt this clarity in my own work. I've noted a few minor comments for your consideration.
mamba
was not callable after installing miniforge3
on my Mac OS. sourmash plot --pdf --lables ecoli_cmp
?benchmarks/benchmarks.py
:
os
& Path
curl -L https://osf.io/bw8d7/download -o delmont-subsample-sigs.tar.gz
test/test_sourmash.py
: Refactoring duplicate code into helper functions to avoid redundancy and improve maintainability. For instance:
def load_signatures(testsigs, ksize=21, moltype='dna'): """Load signatures from files.""" sigs = [] for fn in testsigs: sigs.append(sourmash.load_one_signature(fn, ksize=ksize, select_moltype=moltype)) return sigs
def test_compare_serial(runtmp): """Test compare command serially.""" c = runtmp
testsigs = utils.get_test_data('genome-s1*.sig')
testsigs = glob.glob(testsigs)
c.run_sourmash('compare', '-o', 'cmp', '-k', '21', '--dna', *testsigs)
cmp_outfile = c.output('cmp')
assert os.path.exists(cmp_outfile)
cmp_out = numpy.load(cmp_outfile)
sigs = load_signatures(testsigs)
cmp_calc = numpy.zeros([len(sigs), len(sigs)])
for i, si in enumerate(sigs):
for j, sj in enumerate(sigs):
cmp_calc[i][j] = si.similarity(sj)
sigs = load_signatures(testsigs)
assert (cmp_out == cmp_calc).all()
- 5. I may have overlooked it in your documentation. It would be helpful if you could provide some recommendations for the selection of `k-mers` and `scales`.
- 6. What are the consequences of using different scales? Do you recommend using the same scale to generate the signatures for genomes or databases?
sourmash gather ecoli-genome.sig inter/genbank-k31.lca.json.gz WARNING: final scaled was 10000, vs query scaled of 1000
- 7. Is it possible to add explanations and default values to `--threshold` in the help message of `sourmash lca classify -h`
thanks for this review! We'll get back to you shortly ;)
Thanks @LilyAnderssonLee!
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
pyproject.toml
file or elsewhere.Readme file requirements The package meets the readme requirements below:
The README should include, from top to bottom:
NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing:
@bluegenes This is one of the best maintained packages I've seen. The development environment is easy to setup and work in, each commit has meaningful messages attached, and the documentation is extensive and up to date. I plan on showing this repository to my own team concerning how projects should look when maintained well. I ran into a few hiccups along the way however
The nix flake and envrc needs to be updated to make sure the entire environment is available with a simple call to direnv allow
.
There are unused imports in benchmarks/benchmarks.py
Not all tests ran out of the box in the dev environment via the nix flake but were easily made to work and seem to work in the GitHub actions.
Flake8 revealed a lot of errors, it might not be the worst idea to run code through a linter and fix everything that can be automatically fixed, this may help with the unused imports as well. These are none critical but it helps.
Thank you reviewers! I’ll look over these when I’m back at work next week.
OK! I looked over the reviews, which were both very positive. Congratulations @bluegenes on your very nice package! Post here when the reviews have been addressed and the reviewers will look over the changes. Thanks @elais and @LilyAnderssonLee for your thoughtful reviews!
a tracking update - not yet a response ;) - because life does not proceed without github issues!
@elais review issues:
- The nix flake and envrc needs to be updated to make sure the entire environment is available with a simple call to
direnv allow
.
- There are unused imports in benchmarks/benchmarks.py
- Not all tests ran out of the box in the dev environment via the nix flake but were easily made to work and seem to work in the GitHub actions.
- Flake8 revealed a lot of errors, it might not be the worst idea to run code through a linter and fix everything that can be automatically fixed, this may help with the unused imports as well. These are none critical but it helps.
- [ ] Code format is standard throughout package and follows PEP 8 guidelines (CI tests for linting pass)
- [ ] A repostatus.org badge,
- [ ] Python versions supported,
If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
* [ ] **References:** With DOIs for all those that have one (e.g. papers, datasets, software).
tracking links for @LilyAnderssonLee review -
* [ ] Badges for: * [ ] A [repostatus.org](https://www.repostatus.org/) badge, * [ ] Python versions supported,
NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)
* [ ] If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
* [ ] **Packaging guidelines**: The package conforms to the pyOpenSci [packaging guidelines](https://www.pyopensci.org/python-package-guide). * [ ] Code format is standard throughout package and follows PEP 8 guidelines (CI tests for linting pass)
For packages also submitting to JOSS
* [ ] **References:** With DOIs for all those that have one (e.g. papers, datasets, software).
* 1. `mamba` was not callable after installing `miniforge3` on my Mac OS.
* 2. Is it possible to include arguments to allow users to adjust the legends of both matrix and dendrogram plots in the function mentioned in the example: `sourmash plot --pdf --lables ecoli_cmp` ?
* 3. Regarding `benchmarks/benchmarks.py` : * 1. Unused libraries: `os` & `Path` * 2. Magic numbers: There are some magic numbers in the code (e.g., 500, 21, 3000, 10000, 1000, 20). Is it possible to define these numbers as constants with meaningful names or add the comments to make the code more self-explanatory?
* 4. The download link doesn't work in the section [#Building your own LCA database](https://sourmash.readthedocs.io/en/latest/tutorials-lca.html). The correct link should be `curl -L https://osf.io/bw8d7/download -o delmont-subsample-sigs.tar.gz`
* 4. Regarding `test/test_sourmash.py` : Refactoring duplicate code into helper functions to avoid redundancy and improve maintainability.
* 5. I may have overlooked it in your documentation. It would be helpful if you could provide some recommendations for the selection of `k-mers` and `scales`. * 6. What are the consequences of using different scales? Do you recommend using the same scale to generate the signatures for genomes or databases?
sourmash gather ecoli-genome.sig inter/genbank-k31.lca.json.gz WARNING: final scaled was 10000, vs query scaled of 1000
https://github.com/sourmash-bio/sourmash/issues/2918
* 7. Is it possible to add explanations and default values to `--threshold` in the help message of `sourmash lca classify -h`
- The nix flake and envrc needs to be updated to make sure the entire environment is available with a simple call to
direnv allow
.
re https://github.com/sourmash-bio/sourmash/issues/2905: @elais, are you per chance running this on ARM macOS? This works on Linux and I tried on x86_64 macOS before, but I don't have access to an ARM macOS to try it out...
Yes I am running it on an M1 Mac. I could try running it in a virtual machine to see if I get better results.
Yes I am running it on an M1 Mac. I could try running it in a virtual machine to see if I get better results.
I think I found the underlying problem on an x86_64 macOS, will work on fixing it. Thanks for raising the issue!
Hi @LilyAnderssonLee , @elais and @snacktavish,
Thank you so much for your reviews and your enthusiasm for sourmash! We have put a lot of effort into maintenance over the past few years, and it's fantastic to get some external feedback for improvement.
@ctb, @luizirber, and I (the sourmash maintainers) have addressed each of your comments via the issues posted above. As those address each point in-line, I will summarize here and provide text and explanations where needed.
We have cut a new release with these changes (v4.8.6), and plan to release version v4.9.0 when accepted and we have a new DOI.
repostatus
, pyver
python versions Our goal with sourmash is to maintain a swiss-army knife of k-mer based sequence analysis tools for lightweight sequence analysis. There are a myriad of software tools that conduct one or a few sourmash functions, such as pairwise sequence comparisons (e.g. Mash) or taxonomic classification (Kraken, sylph, etc), albeit with different approaches. To avoid a laundry list of comparisons, we have instead added the following text to the README, which we hope helps distinguish sourmash:
sourmash is a k-mer analysis multitool, and we aim to provide stable, robust programmatic and command-line APIs for a variety of sequence comparisons. Some of our special sauce includes:
FracMinHash
sketching, which enables accurate comparisons (including ANI) between data sets of different sizessourmash gather
, a combinatorial k-mer approach for more accurate metagenomic profiling Please see the sourmash publications for details.
scaled
selection to the FAQ. These can be seen here: https://sourmash.readthedocs.io/en/latest/faq.html#what-k-mer-size-s-should-i-use-with-sourmashminiforge
for their operating system (installing the linux version on mac will not be usable). bioconda
requirement, as sourmash-minimal
installs from conda-forge
.nix flake and envrc
- update for arm-based macs (https://github.com/sourmash-bio/sourmash/pull/2975)--labels-to
and --labels-from
to allow user customization of plot labels with sourmash plot
--threshold
argument in LCA classify
help="minimum number of hashes needed for a taxonomic classification (default: 5)")
benchmarks/benchmarks.py
benchmarks.py
to label relevant constants used for testing (e.g. MinHash k-mer size and the number of times to repeat each function for benchmarking)ruff
and set up a linting pre-commit hook to lint+fix all PR's (https://github.com/sourmash-bio/sourmash/pull/2427)test_sourmash.py
While we absolutely agree that helper functions improve maintainability, the duplicated code in our tests is a result of a specific testing philosophy: tests should be simple, "flat" (not call a lot of test-specific functions), have low code complexity (minimize if
and for
statements), and otherwise not be viewed as primary code to be maintained and refactored regularly.
This has indeed led to a lot of repeated code in the tests (which currently number 2753!), but often these tests must load or modify test data in a unique manner in order to trigger edge cases and check bugfixes we've completed. So we want to avoid editing the tests and potentially reducing branch coverage and their coverage of edge cases.
For the future, we would appreciate any suggestions on large-scale ways to discover and/or fix instances of widely-shared code blocks and update some of the legacy items (e.g. decorators being used rather than our updated text fixtures).
@bluegenes @ctb, @luizirber Thank you for your effort in addressing those comments. I am very pleased with your response. Well done!
hi folks, please let me know if there's anything more we need to do! thanks!
Hi all, Sorry for the delay! @bluegenes @ctb, @luizirber great work!!
@elais - were your comments sufficiently addressed? Any further comments to add on the updated version?
Thanks everyone!
hi friends! 👋 thank you everyone for all of the work you've done here. it looks like we could use some input from @elais on whether comments were addressed. @elais can you kindly respond here and let us know so we can keep things moving ✨ many many thanks! 👐
yay!! I heard back, and you are officially good to go! Congratulations and I apologize for the slow review process. I will post the wrap up checklist now!!
🎉Sourmash has been approved by pyOpenSci! Thank you @bluegenes @luizirber and @ctb for submitting sourmash and many thanks to @elais and @LilyAnderssonLee for reviewing this package! 😸
There are a few things left to do to wrap up this submission:
[![pyOpenSci](https://tinyurl.com/y22nb8up)](https://github.com/pyOpenSci/software-review/issues/issue-number)
.It looks like you would like to submit this package to JOSS. Here are the next steps:
Please complete the final steps to wrap up this review. Editor, please do the following:
6/pyOS-approved6 🚀🚀🚀
.6/pyOS-approved
label). Once accepted add the label 9/joss-approved
to the issue. Skip this check if the package is not submitted to JOSS.If you have any feedback for us about the review process please feel free to share it here. We are always looking to improve our process and documentation in the peer-review-guide.
thanks all! really happy! :tada:
question: if we want to go for JOSS, should we do a new release now, for pyopensci, and then another, assuming JOSS accepts? or can we wait and just do a release if/when JOSS accepts? we don't have a real reason for a new release just yet (we've been doing them regularly anyway!) so could go either way. Just curious.
(we have citation instructions in the software that we would change for the JOSS release.)
I checked in with @lwasser and we say - Do a release now for pyos then another bump for the JOSS paper which than can be linked to the DOI that joss provides!
Thank you!
Thanks @snacktavish @lwasser!!
for tracking:
There are a few things left to do to wrap up this submission:
It looks like you would like to submit this package to JOSS. Here are the next steps:
hey there everyone! i'm just checking in on editor wrap up tasks here!! a few things
@bluegenes @LilyAnderssonLee @elais - if you have time, can you kindly complete our post-review survey? we really appreciate your input as it helps us improve our review process and it also helps us when we write writeups for funders, etc demonstrate impact. It looks like titus did complete it
- So the only reason why it wouldn't be accepted is that this is a second review so you have to demonstrate significant updates to the package (i suspect there are because of that @ctb has mentioned in slack). but this is JOSS' call not ours!
I misread this the first time to be "did you make substantial changes b/t pyopensci and JOSS reviews", and was like, nope!
And then I reread it and was like, oh, yes, in the ~5 years and 3 major versions of sourmash, we have indeed made substantial updates to the package 😆
It looks like titus did complete it
a statement rarely found in the wild
Ok a few more items
@bluegenes @ctb
We'd LOVE for you and your team to write a blog post on SourMash for us to promote your work! if you are interested - here are a few examples of other blog posts:
you can direct any questions to @kierisi on the blog post if that is of interest!
here is a markdown example that you could use as a guide when creating your post.
it can even be a tutorial like post that highlights what your package does. Then we can share it with people to get the word out about your package.
If you are too busy for this no worries. But if you have time - we'd love to spread the word about your package!
Finally - EVERYONE involved in this review (authors, editors, reviewers, etc) is invited to join our pyOpenSci slack. If you are interested, please reply and @kierisi can ensure that you get an invite! Thank you everyone. You may here from me again as i just cross additional t's
and dot i's
in these last steps of the review! 🚀
I really appreciate ALL of the effort htat went into this review from EVERYONE here in this issue. Thank you all for being a part of the wonderful pyOpenSci community. You all make it the supportive community that it is 💟
It looks like titus did complete it
a statement rarely found in the wild
@ctb 🤣 🤣 but we have it now IN WRITING! (and under version control at that!)
@bluegenes @LilyAnderssonLee @elais - if you have time, can you kindly complete our post-review survey? we really appreciate your input as it helps us improve our review process and it also helps us when we write writeups for funders, etc demonstrate impact. It looks like titus did complete it
I filled it out - do you not have mine? I can redo if you need
We'd LOVE for you and your team to write a blog post on SourMash for us to promote your work!
we'd love to -- I'll reach out again when we have something.
Finally - EVERYONE involved in this review (authors, editors, reviewers, etc) is invited to join our pyOpenSci slack. If you are interested, please reply and @kierisi can ensure that you get an invite! Thank you everyone. You may here from me again as i just cross additional t's and dot i's in these last steps of the review! 🚀
yes - I'd be happy to join the slack!
@bluegenes @LilyAnderssonLee @elais - if you have time, can you kindly complete our post-review survey? we really appreciate your input as it helps us improve our review process and it also helps us when we write writeups for funders, etc demonstrate impact. It looks like titus did complete it
I filled it out - do you not have mine? I can redo if you need
i believe you - i wonder what happened. i see titus but not anyone else from this review. i just tested out the form and it seems to work... if you have the bandwidth to take 5-10 minutes to fill it out i'd greatly appreciate it! thank you so much. maybe something odd happened in the google-form-verse?
@bluegenes sending the invite now!
thank you @kierisi !! 🚀 also @bluegenes if you have any questions about the blog side of things - jesse can help!!
Closing this as it was accepted by joss in June!! thank you everyone!
Yay!!! Congrats to all!!
hey there @snacktavish 👋🏻 just saying hello!! 💖 thank you for the work that you did on this review!!
Submitting Author: Tessa Pierce-Ward (@bluegenes) All current maintainers: @ctb, @luizirber, @bluegenes Package Name: sourmash One-Line Description of Package:
sourmash
is a command line tool and Python library for sketching collections of DNA, RNA, and amino acid k-mers for biological sequence search, comparison, and analysis. Repository Link: https://github.com/sourmash-bio/sourmash Version submitted: 4.8 Editor: @snacktavish Emily Jane McTavish EiC: @NickleDave Reviewer 1: @LilyAnderssonLee Lili Andersson-Li Reviewer 2: @elais Elais Player Archive: Version accepted: v4.8.8 JOSS: Date accepted (month/day/year): 4/4/2024Code of Conduct & Commitment to Maintain Package
Description
Scope
Please indicate which category or categories. Check out our package scope page to learn more about our scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):
Domain Specific & Community Partnerships
Community Partnerships
If your package is associated with an existing community please check below:
For all submissions, explain how the and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or
@tag
the editor you contacted:Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication Options
JOSS Checks
- [x] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [x] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [x] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [x] The package is deposited in a long-term repository with the DOI: 10.21105/joss.00027 *Note: JOSS accepts our review as theirs. You will NOT need to go through another full review. JOSS will only review your paper.md file. Be sure to link to this pyOpenSci issue when a JOSS issue is opened for your package. Also be sure to tell the JOSS editor that this is a pyOpenSci reviewed package once you reach this step.*Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Confirm each of the following by checking the box.
Please fill out our survey
P.S. Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
The editor template can be found here.
The review template can be found here.