Open editorialbot opened 1 month ago
Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.
For a list of things I can do to help you, just type:
@editorialbot commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@editorialbot generate pdf
Software report:
github.com/AlDanial/cloc v 1.90 T=0.27 s (183.3 files/s, 477312.9 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
SVG 10 1 128 108888
Python 10 366 1162 2309
Markdown 7 185 0 747
YAML 9 25 30 224
TeX 1 16 0 197
Jupyter Notebook 5 0 12948 193
TOML 1 10 1 49
DOS Batch 1 8 1 26
reStructuredText 4 25 32 22
make 1 4 7 9
-------------------------------------------------------------------------------
SUM: 49 640 14309 112664
-------------------------------------------------------------------------------
Commit count by author:
343 j.niesser
35 Michael Osthege
32 Jochen NieΓer
32 Osthege, Michael
5 dependabot[bot]
Paper file info:
π Wordcount for paper.md
is 1955
β
The paper includes a Statement of need
section
License info:
π‘ License found: GNU Affero General Public License v3.0
(Check here for OSI approval)
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
ππΌ @MicroPhen, @Adafede, @lazear - this is the review thread for the submission. All of our communications will happen here from now on.
As a reviewer, the first step is to create a checklist for your review by entering
@editorialbot generate my checklist
as the top of a new comment in this thread. These checklists contain the JOSS requirements. As you go over the submission, please check any items that you feel have been satisfied. The first comment in this thread also contains links to the JOSS reviewer guidelines.
The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues directly in the software repository. If you do so, please mention this thread so that a link is created (and I can keep an eye on what is happening). Please also feel free to comment and ask questions in this thread. It is often easier to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.
We aim for reviews to be completed within about 2-4 weeks. Please let me know if any of you require some more time. We can also use EditorialBot (our bot) to set automatic reminders if you know you'll be away for a known period of time.
Please feel free to ping me (@csoneson) if you have any questions or concerns. Thanks!
First of all: thank you very much for agreeing to review our manuscript and program.
I just wanted to inform you that I created a branch to improve and streamline our GitHub repo's landing page a moment ago as it is a little bit outdated and confusing. More importantly, though, here is the link to our documentation which will be placed more prominently on the new landing page. It contains a lot of helpful information which should simplify your review.
@Y0dler couple comments/suggestions/etc (will update as I go - likely to be in multiple chunks):
You guys should be commended for the quality of your code and documentation. It's very good!
I think the bibliography/references are messed up, only one reference appears at the end of the paper proof.
I am not super familiar with bayesian peak picking, but I know there are other papers in this area - perhaps consider citing them and very briefly discuss differences/tradeoffs?
Is it possible to benchmark your approach vs standard normal models on benchmark data (e.g. synthetic data with known parameters + noise?)
Is this algorithm only suitable for analysis of a single transition in targeted LC/MS data? Could it handle multiple transitions/product ions from the same precursor?
What are the performance characteristics of your approach? Is this scalable to integrating 1000s or 100,000s of peaks in a feasible time scale?
@csoneson clarifications requested:
Thanks for your comments and kind words, @lazear!
Regarding the references, I think they should be fine as far as I can see. But if you use the "View article proof on GitHub" link from editorialbot, it hides the last page with most of the references at first (until you hit the "More Pages" button at the very bottom). Maybe that was the problem? Otherwise you can also download the current draft from GitHub actions.
I am not super familiar with bayesian peak picking, but I know there are other papers in this area - perhaps consider citing them and very briefly discuss differences/tradeoffs?
Definitely a very good point, I think we need to add that and I'll see what I can do given that we're already way past the theoretical limit of 1000 words^^ I'll try to add it as soon as possible but it might take a couple of days.
Is it possible to benchmark your approach vs standard normal models on benchmark data (e.g. synthetic data with known parameters + noise?)
We have tested our approach with noisy synthetic data sets of normal-shaped and skew normal-shaped peaks and compared the results with the ground truth. Originally, this was also part of the paper but we moved it to the documentation in order to decrease the word count. You can find it here. I hope, this is what you meant. There is also a comparison with a commercially available vendor software using an experimental data set. To your later point regarding data sharing: this data set is currently not in the repository.
Is this algorithm only suitable for analysis of a single transition in targeted LC/MS data? Could it handle multiple transitions/product ions from the same precursor?
Currently, one would have to analyze multiple transitions sequentially so in case of multiple product ions, no matter if they are from the same precursor ion or not, the time series (or extracted ion chromatogram) would have to be supplied separately for each mass trace and one would have to enter the mass traces into the template Excel file so that they are analyzed in one batch run (but one after the other). I think we discussed a possibility to fit multiple peaks in a hierarchical model of sorts but for a more exact answer I would refer this one to @michaelosthege.
What are the performance characteristics of your approach? Is this scalable to integrating 1000s or 100,000s of peaks in a feasible time scale?
The performance depends on a number of parameters, e.g. which peak model is used for fitting, which sampler is used (ideally nutpie), whether a signal needs to be sampled again with more tuning samples or whether it is rejected right away etc.
The model selection takes a few minutes per mass trace but you only have to do that once for a batch run.
The actual peak analysis may take 20 - 30 s for a single peak and maybe 60 - 90 s for a double peak but in the notebooks it appears that most of that time is taken up by model instantiation and the sampling itself only lasts a few seconds. So, in order to scale this up to 1000s of peaks or more, one would need to parallelize the procedure on a computation cluster and/or implement a way to fit multiple peaks without re-instantiating the model every time. The latter is something that I think can be added to the software, the former is probably more on the user side. I suppose there are also ways to parallelize in Python directly but at our institute we chose the route with a computation cluster so this is not something we worked on.
Let me know if the answers were satisfactory and of course if you have more comments or questions.
Hi @lazear - thanks for your questions
According to the JOSS data sharing policy, "If any original data analysis results are provided within the submitted work, such results have to be entirely reproducible by reviewers, including accessing the data". To me, this does seem to apply here - the figures displayed in the paper (and those comparing to the commercially available software in the documentation) were generated for the purpose of this paper.
We typically leave it to the authors to decide who is eligible for authorship, in line with our authorship policy. Specifically, "Purely financial (such as being named on an award) and organizational (such as general supervision of a research group) contributions are not considered sufficient for co-authorship of JOSS submissions, but active project direction and other forms of non-code contributions are". Perhaps @Y0dler can comment on this for clarification.
Hi @csoneson,
regarding data sharing: the repo was updated so that the notebooks to reproduce the data processing etc. have been added to peak-performance\docs\source\notebooks
and the raw data to peak-performance\docs\source\notebooks\paper raw data
and its sub-directories. In case the repo changes down the road, the raw data can be found on zenodo under version 0.7.1. The data availability statement was changed accordingly.
Regarding the authorship, @MicroPhen did indeed contribute via active project direction. For example, it was his idea to perform the validation via synthetic data, he helped identify a bug with the posterior predictive sampling etc. Also, I have since moved on to a different job, so further developments of the software will be realized by a successor under the guidance of @MicroPhen.
Also, we merged the paper branch into the main branch and created a version 0.7.1 release. This does not contain any changes to the program code, it just adds the paper, raw data and so on to the repo. Also, we changed the instructions for the installation (@lazear, @Adafede) since there were version conflicts with NumPy, PyMC, and numba.
Is this algorithm only suitable for analysis of a single transition in targeted LC/MS data? Could it handle multiple transitions/product ions from the same precursor?
I think we discussed a possibility to fit multiple peaks in a hierarchical model of sorts but for a more exact answer I would refer this one to @michaelosthege.
Yes, using the functions from peak_performance.models
it is possible to define a hierarchical PyMC model, that could connect multiple peaks by, for example, sharing the retention time parameter.
This would enable a more wholistic quantification of uncertainties and is something we would like to explore in the future, but it was out of scope for the current release of the library.
@editorialbot set main as branch
Done! branch is now main
@editorialbot generate pdf
@editorialbot set v0.7.1 as version
I'm sorry @michaelosthege, I'm afraid I can't do that. That's something only editors are allowed to do.
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@editorialbot set 10.5281/zenodo.13925914 as archive
I'm sorry @michaelosthege, I'm afraid I can't do that. That's something only editors are allowed to do.
@michaelosthege - I will update the version and set the archive at the end of the review process, for now we can leave them as they are.
I've tested installation and run the examples from the documentation. Besides a minor versioning issue everything went smoothly. I think all of the checklist items for me are complete, and again, the authors should be commended on the excellent state of the documentation, code quality, and ease of installation and verification. I recommend publication.
My only remaining comments are that it would be nice to briefly mention performance/runtime of the algorithm in the paper as above.
Thanks for the review! I'll add your suggested additions regarding the performance of the software and the comparison to other papers applying Bayesian statistic to peak data evaluation to the pull request you opened π
@lazear - thanks a lot for your review!
Some additional notes:
@lazear - I just added a sentence addressing the performance of the software as well as a very short comparison to previous literature to your PR. In case you are fine with these changes and sign the CLA, we can go ahead and merge it.
@Adafede - I also introduced changes to our PR with regards to the preparation of *.npy
for the program. Instead of the markdown document, it is now a notebook with a little more detailed explanations.
@Y0dler Thank you for all these!
I admit https://github.com/JuBiotech/peak-performance/issues/32 is probably quite hard so maybe @lazear or @csoneson can comment if this seems reasonable?
Regarding https://github.com/JuBiotech/peak-performance/issues/32 - I think providing an explanation and/or some examples of how to generate the required input files from 'standard' files available to the user would indeed help adoption by people who don't have access to the custom data preparation pipeline used by the authors. Perhaps in this case the mzML
format can be considered a suitable standardized starting point. Even without an automatic conversion function in place, I would suggest adding some examples (which can then be adapted by the user if needed) that walk through the process of parsing these files and saving the data in a suitable format for further analysis.
I agree that It might be nice having an example highlighting mzML usage, but IMO given that this package is a relatively niche library and not necessarily a standalone tool, it seems somewhat out of scope to me to require the authors to develop a full mzML pipeline given that this is largely for processing SRM/single ion data. They take numpy arrays as input, which can be easily obtained from mzML files (pyteomics or pymzML python packages) - presumably anyone using this library will be able to manage loading data into numpy arrays.
@Adafede, @lazear, @csoneson - I added such an example (see PR 36). It starts from a *.wiff
file (Sciex) which I converted to *mzML
via ProteoWizard. Subsequently, I expanded the Preparing raw data...
example to include a section where the file is opened and an extracted ion chromatogram is parsed via pyteomics
and prepared for use with PeakPerformance
.
However, I can't upload the mzML file to GitHub since it is huge (158 MB). We'll have to find a different solution for that.
Okay, a solution was found and the *.mzML
file was attached to release 0.7.1. The instructions in the example notebook were updated accordingly, as well.
I think the final open point remains PR 28 which can only be merged once @lazear has signed the CLA. Sorry about that. If you would prefer not to sign it, I can create a new PR, instead, and credit you in it :)
Very strange CLA but I think I have signed it correctly. If not, please feel free to create a new PR/whatever. I don't need any credit :)
Yeah, it's weird :D even after re-running the CLA job, it didn't work so I went with a new PR instead and closed the old one.
@csoneson I think you can trigger the generation of a new PDF here so we can check if everything compiles correctly (in GitHub Actions it did so I assume there won't be a problem).
@editorialbot generate pdf
@Y0dler You can also trigger the pdf generation on your side (with @editorialbot generate pdf
)
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@csoneson Is it expected that editors get their ORCID linked but not the reviewers?
@csoneson Ah, I thought only you could do that since it failed earlier when Michael tried it^^ sorry about that.
The PDF looks ready to me, how about you guys @michaelosthege @MicroPhen?
@editorialbot generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Michael found a broken link which was fixed in this version π Aside from that, he approved of it.
@csoneson Is it expected that editors get their ORCID linked but not the reviewers?
Yes, this is the case currently. It has been discussed in the editorial team, and linking reviewer ORCID would likely be technically possible (if the reviewers have added it in their JOSS reviewer profile), but the development resources are limited and other aspects have had priority.
ππΌ
The pdf looks also ready to me. Thank you very much for your great support!
@csoneson Does that mean, we can create a new release in order to conclude the review?
Sorry for the delay. I see that both reviewers have checked all boxes in the checklists, and @lazear recommended publication in a previous comment. @Adafede - are there any outstanding comments on your side?
Once we have the explicit approval from both reviewers, I will also take a quick look through the submission, and then we can move on.
@csoneson Sorry, did not want to delay things. No remaining outstanding comments, I explicitly approve it! π
Submitting author: !--author-handle-->@MicroPhen<!--end-author-handle-- (Stephan Noack) Repository: https://github.com/JuBiotech/peak-performance/ Branch with paper.md (empty if default branch): main Version: v0.7.0 Editor: !--editor-->@csoneson<!--end-editor-- Reviewers: @Adafede, @lazear Archive: Pending
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@Adafede & @lazear, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @csoneson know.
β¨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest β¨
Checklists
π Checklist for @Adafede
π Checklist for @lazear