Closed Midnighter closed 1 year ago
hey there 👋 @Midnighter welcome to pyopensci and thank you for this submission! i just wanted to say hello so you know that we see this submission and will be getting back to you shortly! i also am looking into forms for the submission template as well - many thanks for that suggestion! More soon!
Welcome @Midnighter @sofstam @jfy133! I'm adding initial checks for this package.
Glad to see this as someone who develops a similar tool (for a different domain!).
Looks like most everything is there.
I do want to request one thing before we proceed with a review though: can you please add at least one full tutorial?
Something like the "Examples" section in the Pynteny docs here (which also has a CLI interface): https://robaina.github.io/Pynteny/examples/example_cli/
(These docs were in place before we began the Pynteny review, which is why I'm providing them as an example of what we need to see before we start.)
I would suggest a sort of walkthrough of the main use case for taxpasta. This might be a simplified version of something you're using it for in research already.
I have provided some other feedback below but adding an initial tutorial is the only thing that's necessary at this time.
Hi there! Thank you for submitting your package for pyOpenSci review. Below are the basic checks that your package needs to pass to begin our review. If some of these are missing, we will ask you to work on them before the review process begins.
Please check our Python packaging guide for more information on the elements below.
import package-name
.[x] Documentation The package has sufficient online documentation to allow us to evaluate package function and scope without installing the package. This includes:
taxpasta -h
could be shown and additionally a brief snippet. If the main usage is the CLI, then perhaps demos of usage could be shown with a tool like asciicinema
taxpasta -h
was truncated for me. See below. Not sure it would be clear to a new user what consensus
and merge
commands do because of the ellipsis. From the docs I guess this is because your cli package imports text from somewhere else? Is there a way to not truncate? Maybe just give a briefer summary?
$ taxpasta -h
Usage: taxpasta [OPTIONS] COMMAND [ARGS]...
TAXonomic Profile Aggregation and STAndardisation
Options: --version Print only the current tool version and exit. -l, --log-level [DEBUG|INFO|WARNING|ERROR|CRITICAL] Set the desired log level. [default: INFO] --install-completion [bash|zsh|fish|powershell|pwsh] Install completion for the specified shell. --show-completion [bash|zsh|fish|powershell|pwsh] Show completion for the specified shell, to copy it or customize the installation. -h, --help Show this message and exit.
Commands: consensus Form a consensus for the same sample but from different... merge Standardise and merge two or more taxonomic profiles into... standardise Standardise a taxonomic profile (alias: 'standardize').
- [x] Short tutorials that help a user understand how to use the package and what it can do for them.
- There are some snippet-like examples in documentation for commands of the command-line interface but I do not find more extensive tutorials
- [x] API documentation (documentation for your code's functions, classes, methods and attributes): this includes clearly written docstrings with variables defined using a standard docstring format. *We suggest using the [Numpy](https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard) docstring format*.
- [x] Core GitHub repository Files
- [x] **README** The package has a `README.md` file with clear explanation of what the package does, instructions on how to install it, and a link to development instructions.
- [x] **Contributing File** The package has a `CONTRIBUTING.md` file that details how to install and contribute to the package.
- @Midnighter is there a link to the CONTRIBUTING.rst in the README? I'm not finding it. If not, I'd suggest adding it.
- [x] **Code of Conduct** The package has a `Code of Conduct` file.
- inside the .github directory
- [x] **License** The package has an [OSI approved license](https://opensource.org/licenses).
NOTE: We prefer that you have development instructions in your documentation too.
- [x] **Issue Submission Documentation** All of the information is filled out in the `YAML` header of the issue (located at the top of the issue template).
- [x] **Automated tests** Package has a testing suite and is tested via GitHub actions or another Continuous Integration service.
- [x] **Repository** The repository link resolves correctly.
- [x] **Package overlap** The package doesn't entirely overlap with the functionality of other packages that have already been submitted to pyOpenSci.
- [ ] **Archive** (JOSS only, may be post-review): The repository DOI resolves correctly.
- [ ] **Version** (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?
---
- [ ] [Initial onboarding survey was filled out ](https://forms.gle/F9mou7S3jhe8DMJ16)
We appreciate each maintainer of the package filling out this survey individually. :raised_hands:
Thank you authors in advance for setting aside five to ten minutes to do this. It truly helps our organization. :raised_hands:
---
*******
## Editor comments
This is not required in checks but I would suggest adding example data to taxpasta.
Since you are working with tabular data that I'd guess can be failry lightweight, it should be relatively painless to add and it will really help people understand package usage. (Unless I'm wrong and the taxonomic profiles are like 100 MB each. In that case, see next paragraph.)
Again see the Pynteny example where they work with example data (not suggesting you need to add a download command but just showing you why built-in data is useful: https://robaina.github.io/Pynteny/examples/example_cli/#download-pgap-profile-hmm-database)
For my own library I just add the files directly to the package and then access with `importlib.resources`, see this issue: https://github.com/vocalpy/crowsetta/issues/90
Some libraries have adopted the [`pooch` library](https://github.com/fatiando/pooch) for data, e.g. scikit-image: https://scikit-image.org/docs/stable/api/skimage.data.html. Might be worth snooping their issues to help you figure out how to implement if you decide to adopt it.
Also @sofstam @jfy133 could you both please fill out the pre-review survey?
I have a reply for @Midnighter but not you I think.
https://forms.gle/F9mou7S3jhe8DMJ16
It's just a brief (5-10m) survey to help us understand how we're doing as an org. Thank you! :pray:
Hi @NickleDave,
Thank you for your review and comments. Adding a tutorial and improving the docs mostly makes sense to me. I will work on adding those.
The tables are indeed small, still I'm a little hesitant to distribute them with the package. However, pandas should be able to just load a table from a URL so that's maybe a way to go.
Also @sofstam @jfy133 could you both please fill out the pre-review survey? I have a reply for @Midnighter but not you I think. https://forms.gle/F9mou7S3jhe8DMJ16 It's just a brief (5-10m) survey to help us understand how we're doing as an org. Thank you! pray
oops, sorry - from the second page most of the questions seemed to be about the package and I assume Moritz had already provided all that info.
I've filled it out, but left pretty much all the optional questions empty as either you have that info from Moritz, and/or submitting via pyOpenSci is Moritz' initative so I'm not that familiar/involved with python etc.
One comment though: I noticed that the options in your 'What background best describes you cultural identity?' question is extremely N. American focused. Unfortunately I don't have a good solution for you, but you may not be getting a particularly good overview of this - e.g. Asian spans half the world with many equally large sub-divisons as the Pacific Islanders, and also grouping 'Black' and African-American is also arguably unfair as they also can have a large difference in backgrounds/problems etc. For example see what official surveys from UK use: https://www.ethnicity-facts-figures.service.gov.uk/style-guide/ethnic-groups
@NickleDave regarding your comments on taxpasta -h
the truncation is automatically created by typer. It looks better if you also install rich. I'll look into shortening the summaries such that they are not truncated by typer in the default view.
Thank you for your review and comments. Adding a tutorial and improving the docs mostly makes sense to me. I will work on adding those.
Great, thank you @Midnighter. Your other comments re: tables, bioconda, Zenodo, and the truncation all sound good. Would it be worth making rich a dependency so you don't have to special case typer behavior?
oops, sorry - from the second page most of the questions seemed to be about the package and I assume Moritz had already provided all that info.
I've filled it out, but left pretty much all the optional questions empty as either you have that info from Moritz, and/or submitting via pyOpenSci is Moritz' initative so I'm not that familiar/involved with python etc.
One comment though: I noticed that the options in your 'What background best describes you cultural identity?' question is extremely N. American focused.
Thank you @jfy133 I hear you and I will relay this to our executive director @lwasser. Your feedback is helpful; this is definitely a version 0.1 of the survey and I'm sure we can improve it to not be US-centric, and to encompass people who are core contributors to a project even if they are not primarily Python developers.
Would it be worth making rich a dependency so you don't have to special case typer behavior?
At the moment, it's an extra dependency that can be installed with taxpasta[rich]
. (Come to think of it, we should document all of those.) The reason that it's not default is that where taxpasta originated, as a CLI tool for a nextflow pipeline, fancy terminal output is really not needed. We should definitely recommend installing that for interactive work, though.
where taxpasta originated, as a CLI tool for a nextflow pipeline, fancy terminal output is really not needed.
Understood!
Btw @Midnighter could I ask you to link this review issue on any issues you raise to make changes before review? By adding a link to the issue on the taxpasta repo, so that GitHub cross references them.
Just to help us track. Thank you :pray:
Dear @NickleDave,
We think that we've implemented everything that you've noticed. Please take another look and let us know your thoughts.
Hi @Midnighter I did check out these additions -- looks great!
Thank you for addressing all of those comments.
I will move ahead with finding an editor, will reply back here introducing them ASAP.
Hi again @Midnighter very happy to say that @ctb has very kindly agreed to take on the editor role for this review. Thanks for your patience while we found someone who knows your domain and the software community around it.
I will let @ctb take it from here and start tagging in reviewers.
I guess we'll have to add support for sourmash then 😆 Good to hear and I look forward to your review @ctb. Thank you for your time.
I guess we'll have to add support for sourmash then 😆 Good to hear and I look forward to your review @ctb. Thank you for your time.
welcome! ironically we have invested time in generating the same report formats as you, for other consumers of taxonomy, so I think we are well positioned to engage productively ;).
Also @sofstam @jfy133 could you both please fill out the pre-review survey? I have a reply for @Midnighter but not you I think. https://forms.gle/F9mou7S3jhe8DMJ16 It's just a brief (5-10m) survey to help us understand how we're doing as an org. Thank you! pray
oops, sorry - from the second page most of the questions seemed to be about the package and I assume Moritz had already provided all that info.
I've filled it out, but left pretty much all the optional questions empty as either you have that info from Moritz, and/or submitting via pyOpenSci is Moritz' initative so I'm not that familiar/involved with python etc.
One comment though: I noticed that the options in your 'What background best describes you cultural identity?' question is extremely N. American focused. Unfortunately I don't have a good solution for you, but you may not be getting a particularly good overview of this - e.g. Asian spans half the world with many equally large sub-divisons as the Pacific Islanders, and also grouping 'Black' and African-American is also arguably unfair as they also can have a large difference in backgrounds/problems etc. For example see what official surveys from UK use: https://www.ethnicity-facts-figures.service.gov.uk/style-guide/ethnic-groups
hey y'all just wanted to note that i'm working on the survey questions being a LOT more inclusive. Many thanksf or your patience there and THANK YOU for calling this to our attention! It's very much american-centric
(finally got around to asking second reviewer; response soon :))
:wave: Hi @bluegenes and @snacktavish! Thank you for volunteering to review for pyOpenSci!
The following resources will help you complete your review:
Reviewers: @bluegenes @snacktavish Due date: April 21, 2023.
ok, trying tagging in @bluegenes and @snacktavish again. Or do we need to invite them to this repo also?
Thank you @ctb, I think we're good to go from here. Just the editor needs repo access to be able to edit metadata in the authors' original post--sorry again for not adding you before.
no worries, mostly concerned about whether they're being notified!
(they are! or at least tessa is)
I have a question about the version submitted. We've already made new releases since then and I'm planning another round of smaller fixes. It'd make sense to me that the latest version will be reviewed rather than what we started with. Should I just edit the original post or is that against your procedure?
Hi @Midnighter thank you for checking, I know it's not clear.
We want to track what the version was when you submitted.
So please don't edit that.
We of course also want to make sure reviewers are reviewing the latest version, e.g. if you have made changes related to editor checks.
@bluegenes @snacktavish please review the version @Midnighter replies to us with and asks you to review.
@Midnighter I know you probably need to take development time wherever you can find it, but if there's any way you can either finish those small fixes in the next couple days, or hold off for now, that would be appreciated so we're not reviewing something that's changing while we review it.
Thank you for the answer. I'll see that I can make the changes either today or tomorrow and then give you a go ahead.
Thank you! I know that @snacktavish is still off-line during spring break this week so if you want to reply with the version to review Monday that would be fine.
I've just released version 0.3.0 of taxpasta. Please use that for review. I look forward to it.
Hi all, Sorry for the delay - spring break followed by illness has created a series of setbacks, but I should be able to look at this later this week.
no worries @snacktavish - and thanks for reviewing!
hi folks, how are the reviews coming along?
hi @ctb - good, just need to find a bit more time to finish up. Will have it for y'all by friday!
Work in progress! But review may be a little late :grimacing:
OK - I am revealing my slow start, but as far as I can tell, the tutorial is not rendering correctly https://taxpasta.readthedocs.io/en/latest/tutorials/getting-started.md, and it's not really follow-able at https://github.com/taxprofiler/taxpasta/blob/dev/docs/tutorials/getting-started.md Is there a better link I should be using?
Thanks, Emily Jane
Hi @snacktavish !
No worries :).
It think we've made a link in the website with the wrong extension:
https://taxpasta.readthedocs.io/en/latest/tutorials/getting-started/
Should work. I'll try find all instances of that mistake and fix them.
However I'mn not sure why it doens't render at all on the dev branch, I'll look into that now. But I don't think there is much difference in that file compared the master branch.
Cheers,
Hi folks, here's my review.
Great tool - having tried to provide some standardized formats for our profiler, I definitely appreciate the utility provided here!
The package includes all the following forms of documentation:
pyproject.toml
file or elsewhere.Readme file requirements The package meets the readme requirements below:
The README should include, from top to bottom:
NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)
merge
and standardise
, but fully functional examples with file download links would be much betterUsage
section, use-cases
just links to the about
section higher in the README.Merge
section, test-datalinks to a single file example. Two files are needed for the
merge` command.Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 4-5
The taxpasta
software tackles an important issue in metagenomic taxonomic profiling by ingesting the output from a number of popular profilers and producing standardized outputs that can be used in downstream workflows.
I find the back-and-forth between CLI and R/python distracting in the tutorial -- It would be clearer to see the entire command line + execution for taxpasta
, with brief explanations around each command together, since the tool is CLI-based and the other sections are just demonstrating the justification for the tool.
In a "Getting Started" tutorial, including code for the failures (including partial solutions) of standard reading methods might confuse folks who are newer to coding/bioinfo. People may end up confusing which lines they should run on their own data and which they shouldn't.
I think the R/python parts are more appropriate for a follow-up section or tutorial, entitled "Why taxpasta" or similar to indicate that you're digging into the rationale and issues that prompted building taxpasta. That way, beginners who just want to run the cli are comfortable, and intermediate/etc can dig in afterwards. Note, I only tested the python code parts (not R), but those worked great.
If kept as is, tutorial at least needs a bolded line calling attention to the switch between command line and R/Python sections, -- e.g.
The following section uses R/Python.
and similar when you switch back.
I think you're already be aware, but several links are broken from the Getting Started
tutorial:
The adding taxa names
how-to tutorial could usefully be directly added to the main Getting Started
cli tutorial if you switch up the bash/python structure (I do see it linked at the bottom, though!).
README
, you include a full command for standardise
and state that an example file can be found in test data. Can you instead include small test files with the package or add a curl
command to download the exact file, so that command is actually executable? Same for the next command in the README (add curl
to dl a second file to allow merge
). Same comment for documentation pages for these.standardise
README command without downloading the file). Might be confusing for new folks (even though it's very pretty! :) and the only line that matters is the very last one,
FileNotFoundError: [Errno 2] No such file or directory: 'MOCK_002_Illumina_Hiseq_3000_se_metaphlan3-db.metaphlan3_profile.txt
Might be out of scope, just wanted to mention.taxpasta <command> -h
, e.g. taxpasta merge -h
.arrow
format with .tsv
extension)filter
functionality, e.g. by % of total dataset reported in the file? There's some evidence that filtering e.g. Kraken2 output might help reduce false positives (e.g. here )Dear @bluegenes,
Thank you very much for your kind review. I'll try to respond to all your points now and note where we will take action.
A repostatus.org badge,
We're adding a repo status badge
If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
Agreed that this would be a good addition. I think we will write a JOSS paper after seeing the second review and copy this kind of information from there into the documentation.
I don't see any citation info
Technically, the Zenodo DOI already makes this citeable but we'll add more info if taxpasta gets accepted here and at JOSS.
JOSS's requirements
As mentioned above, we plan to write the paper after the initial review.
Tutorial comments
Thank you for the detailed observations on the tutorial. We agree with your points and @jfy133 will work on splitting this up and improving the getting started tutorial. He already fixed the URLs that were wrong.
In the README, you include a full command for standardise and state that an example file can be found in test data. Can you instead include small test files with the package or add a curl command to download the exact file, so that command is actually executable? Same for the next command in the README (add curl to dl a second file to allow merge). Same comment for documentation pages for these.
Good point, we will do so.
There were several screens of terminal output upon command failure (specifically, running the standardise README command without downloading the file). Might be confusing for new folks (even though it's very pretty! :) and the only line that matters is the very last one, FileNotFoundError: [Errno 2] No such file or directory: 'MOCK_002_Illumina_Hiseq_3000_se_metaphlan3-db.metaphlan3_profile.txt Might be out of scope, just wanted to mention.
We will consider this when looking at the tutorial and the readme for your other points above.
It would be helpful to add a general note in the README that users can see additional tool help via taxpasta
-h, e.g. taxpasta merge -h.
We thought that we had covered this in the Usage section of the readme. Do you think that it needs to be stated there that this also works for sub commands?
It would be nice to note somewhere that setting output format explicitly overrides use of the file extension for determining output format (I noticed I could write a arrow format with .tsv extension)
Definitely out of scope, but wanted to suggest an extension: have you considered adding a filter functionality, e.g. by % of total dataset reported in the file? There's some evidence that filtering e.g. Kraken2 output might help reduce false positives (e.g. here )
We agree that, although this is useful in general, it is out of scope for taxpasta and should happen downstream. We rather follow the UNIX philosophy here.
Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new "features".
Thus, we can see a case for two new packages taxpasta-stats and taxpasta-viz (or something like that) in the future that take advantage of these standardised tables but haven't done any scoping work or thought about all the desired capabilities yet.
A bit of a tangent but since you're all experts @ctb, @bluegenes, and @snacktavish I wanted to ask your opinions on the following issue: Currently, taxpasta converts all relative abundances to integers. This design decision was based on my experience with some statistical tools requiring this. I think, it was vegan::adonis
but I can't even remember properly. The other decent option is to convert everything to fractions. A third option is to provide a flag that lets users choose between either output. Do you have any opinions on this?
Hi @bluegenes, your comments on the Tutorial have (hopeully) been addressed here: https://github.com/taxprofiler/taxpasta/pull/95 once merged!
@ctb I am delayed (and have 2 yr old sick out of day care for the next few days...) but am still hoping to finish my review this week. I plan to use the revised tutorial, unless it is important that I stick to the original release one.
thx @snacktavish - take the time you need, and your efforts are much appreciated! I think you should use the revised tutorial.
The revised tutorials have been merged!
Hi all, I have run through the tutorial, and added some comments. I think it is a neat package, that does handy data organization, but could use some further documentation and input checks. I didn't carefully go through the checkboxes in the first section, as I was already quite late on my review.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
pyproject.toml
file or elsewhere.Readme file requirements The package meets the readme requirements below:
The README should include, from top to bottom:
NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
In the tutorial, a little more biological context pn the meaning of the results would be useful.
E.g. in the step head 2612_pe-ERR5766176-db1_kraken2.tsv
These results look a bit concerning to me! What are soemtings things in taxon 0? Are all of the results not id'd to taxon? (This may be a limitation of this my own knowledge of taxonomic profiling, but including some biological context with the technical info can be helpful for users.)
I tried going off-script in the tutorial to see if I could merge the Kraken file with the motus files (this may not make a lot of sense to do, biologically, but I was curious) using:
taxpasta merge -p motus -o dbMOTUs_motus.tsv 2612_pe-ERR5766176-db_mOTU.out 2612_se-ERR5766180-db_mOTU.out 2612_pe-ERR5766176-db1.kraken2.report.txt
It appeared to work... but the results were odd. At the end it showed a taxonomy id of 152 with 152 samples in 2612_pe-ERR5766176-db1.kraken2.report, whereas in the original table it looks like it should be taxon id 9606 with 152 samples. Looking at the docs I think the software is only intended to merge profiles generated by the same tool, but the way that the description is set up, I thought merging profiles across tools was a key advantage of the software. (This is user error - but user error is one thing you can always rely on!!) This looks like a bug? or maybe the software should have thrown an error instead of running? But users will definitely try to run things that are not quite right, and the software should do some checks on input file format rather than chugging along and outputting seemingly sensible but incorrect outputs. (This may connect to the "Danger message" https://taxpasta.readthedocs.io/en/latest/commands/merge/#why here, but it does not even appear to be merging on identifier...)
I wanted to add taxon names to try to make the outputs make more sense to me, but I wasn't able to read the documentation on that at https://taxpasta.readthedocs.io/how-tos/how-to-add-names
The error messages in other places could also use some work - I ran:
taxpasta merge -p motus -o dbMOTUs_motus.tsv 2612_pe-ERR5766176-db_mOTU.out 2612_se-ERR5766180-db_mOTU.out 2612_pe-ERR5766176-db1_kraken2.tsv
Which I think is just not a reasonable file format to try to merge, but the error did not really convey that (although it did point to the problematic file).
I was not able to find the paper.md file to review for JOSS, but I see that you are planning on that later.
Thanks, Emily Jane
Dear @snacktavish,
Thank you for your review. As a quick comment before answering in more detail, where did you find the URL to the how-to that is not found? The proper link is https://taxpasta.readthedocs.io/en/latest/how-tos/how-to-add-names/
Hi @Midnighter - It was the link from here: https://taxpasta.readthedocs.io/en/latest/tutorials/getting-started/#additional-functionality to here https://taxpasta.readthedocs.io/how-tos/how-to-add-names
The how to is there I think - just formatting issues.
Hi folks, looks like you've addressed my concerns, so I'm happy to sign off.
A bit of a tangent but since you're all experts @ctb, @bluegenes, and @snacktavish I wanted to ask your opinions on the following issue: Currently, taxpasta converts all relative abundances to integers. This design decision was based on my experience with some statistical tools requiring this. I think, it was vegan::adonis but I can't even remember properly. The other decent option is to convert everything to fractions. A third option is to provide a flag that lets users choose between either output. Do you have any opinions on this?
No strong opinions -- my personal preference is to not drop information (decimals) when possible, but I think it's fine to stick with integers if there are tools that require this.
Dear @snacktavish,
Thank you again for your review. I will respond below to your points (quoted).
In the tutorial, a little more biological context pn the meaning of the results would be useful.
E.g. in the step head 2612_pe-ERR5766176-db1_kraken2.tsv
These results look a bit concerning to me! What are soemtings things in taxon 0? Are all of the results not id'd to taxon? (This may be a limitation of this my own knowledge of taxonomic profiling, but including some biological context with the technical info can be helpful for users.)
From my point of view, providing interpretation or meaning is down to the individual profilers. Taxpasta is a pure utility and I would assume that its only users are those who have already chosen to run a particular profiler on their data. (For the specific example, taxon 0 is indeed unclassified. That's not surprising since it comes from a very small sequencing file that we run for testing purposes only.)
I tried going off-script in the tutorial to see if I could merge the Kraken file with the motus files (this may not make a lot of sense to do, biologically, but I was curious) using: taxpasta merge -p motus -o dbMOTUs_motus.tsv 2612_pe-ERR5766176-db_mOTU.out 2612_se-ERR5766180-db_mOTU.out 2612_pe-ERR5766176-db1.kraken2.report.txt
It appeared to work... but the results were odd. At the end it showed a taxonomy id of 152 with 152 samples in 2612_pe-ERR5766176-db1.kraken2.report, whereas in the original table it looks like it should be taxon id 9606 with 152 samples. Looking at the docs I think the software is only intended to merge profiles generated by the same tool, but the way that the description is set up, I thought merging profiles across tools was a key advantage of the software. (This is user error - but user error is one thing you can always rely on!!) This looks like a bug? or maybe the software should have thrown an error instead of running? But users will definitely try to run things that are not quite right, and the software should do some checks on input file format rather than chugging along and outputting seemingly sensible but incorrect outputs. (This may connect to the "Danger message" https://taxpasta.readthedocs.io/en/latest/commands/merge/#why here, but it does not even appear to be merging on identifier...)
Thank you for doing that. Your experience here prompted us to do some fairly major refactoring, which made the whole profile validation part much more strict. We have thus ensured that providing an input from another than the chosen profiler will always result in an error. (Except for centrifuge and kraken2 which use the same six column layout.) We have also added tests to verify that errors occur for all profilers. Due to that work our response is also a bit later than hoped.
I wanted to add taxon names to try to make the outputs make more sense to me, but I wasn't able to read the documentation on that at https://taxpasta.readthedocs.io/how-tos/how-to-add-names
Thank you for noticing this. We have fixed links in the documentation.
The error messages in other places could also use some work - I ran: taxpasta merge -p motus -o dbMOTUs_motus.tsv 2612_pe-ERR5766176-db_mOTU.out 2612_se-ERR5766180-db_mOTU.out 2612_pe-ERR5766176-db1_kraken2.tsv Which I think is just not a reasonable file format to try to merge, but the error did not really convey that (although it did point to the problematic file).
This is the same point as above, as far as I can tell.
I was not able to find the paper.md file to review for JOSS, but I see that you are planning on that later.
We hope that we addressed your major points and will begin work on this paper. Please note that if you wanted to test the changes, you will have to install the dev
branch, since we have not released these changes yet.
Just a reminder we believe we have addressed all the comments, and waiting for final sign off so we can complete the JOSS manuscript :)
Hi @jfy133 sounds good to me - I think the JOSS manuscript is indeed the last thing remaining! I think I can review that all on my own without @snacktavish or @bluegenes so lmk when I should take a look :).
hi friends! just chiming in here. @ctb you are more than welcome to look at the JOSS manuscript BUT normally that part of this review happens on the JOSS side of things as they need to actually implement the "Accept to JOSS" part. Instructions for wrapping things up (IF this review is at this stage based upon Titus' evaluation of reviews / responses etc) can be found here! . there is a template that you paste into the issue if all of the reviewer elements have been addressed in this review to your satisfaction - then this can move on to JOSS. I hope that helps ✨ i know this partnership can be confusing and sometimes even the JOSS editors need a bit of guidance (ie that they don't have to review the code only the paper through our partnership!).
I think we meant it's not present a la the checklist in the OP, but indeed the more the merrier in terms of :eyes:!
ahhh! if i follow you correctly you can absolutely work on the JOSS paper now even if the review is not quite wrapped up. AND i see that note about not submitting to JOSS separately. I need to fix that as the process has changed. JOSS now DOES want a submission but you do NOT need to go through another review . You need to link to and mention this review and tell them it was accepted by us. They will and should ONLY review the paper once this review is complete. does that help? NOTE to myself to update the reviewer template and remove that note
Submitting Author: Moritz E. Beber (@Midnighter) All current maintainers: (@Midnighter, @sofstam, @jfy133) Package Name: taxpasta One-Line Description of Package: TAXonomic Profile Aggregation and STAndardisation Repository Link: https://github.com/taxprofiler/taxpasta Version submitted: 0.2.1 Editor: @ctb
Reviewer 1: @snacktavish
Reviewer 2: @bluegenes
Archive: https://github.com/taxprofiler/taxpasta/releases/tag/0.4.0
JOSS DOI: Version accepted: 0.4.0 Date accepted (month/day/year): 07/05/2023
Code of Conduct & Commitment to Maintain Package
Description
The main purpose of taxpasta is to standardise taxonomic profiles created by a range of bioinformatics tools. We call those tools taxonomic profilers. They each come with their own particular, tabular output format. Across the profilers, relative abundances can be reported in read counts, fractions, or percentages, as well as any number of additional columns with extra information. We therefore decided to take the lessons learnt to heart and provide our own solution to deal with this pasticcio. With taxpasta you can ingest all of those formats and, at a minimum, output taxonomy identifiers and their integer counts.
Taxpasta can not only standardise profiles but also merge them across samples for the same profiler into a single table. In future, we also intend to offer methods for forming a consensus for the same sample analyzed by different profilers.
Scope
Please indicate which category or categories. Check out our package scope page to learn more about our scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):
Domain Specific & Community Partnerships
Community Partnerships
If your package is associated with an existing community please check below:
For all submissions, explain how the and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):
Taxpasta is a tool for anyone working with taxonomic profiles from metagenomic sequencing experiments. Mostly that means ecologists, bioinformaticians, statisticians. Taxpasta's main application is to standardise profiles from a range of different tools. Having a singular format facilitates downstream analyses. Taxpasta is used, for example, in the upcoming taxprofiler pipeline implemented in nextflow. There, it also serves to combine the profiles of many samples into a single file.
The BIOM format was created with the intention of standardizing a storage format for microbiome analyses. However, creating this format was entirely left to the user. Taxpasta conveniently knows how to read profiles from a range of tools and can also produce BIOM output.
Some of the taxonomic profilers also come with scripts to convert their output into another format but none of them support such a wide range of tools as taxpasta does.
@tag
the editor you contacted:Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
CONTRIBUTING.rst
).Publication Options
JOSS Checks
- [x] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [x] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: JOSS accepts our review as theirs. You will NOT need to go through another full review. JOSS will only review your paper.md file. Be sure to link to this pyOpenSci issue when a JOSS issue is opened for your package. Also be sure to tell the JOSS editor that this is a pyOpenSci reviewed package once you reach this step.*Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Confirm each of the following by checking the box.
Please fill out our survey
P.S. *Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
The [editor template can be found here][Editor Template].
The [review template can be found here][Review Template].