openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
725 stars 38 forks source link

[REVIEW]: PeakPerformance - A tool for Bayesian inference-based fitting of LC-MS/MS peaks #7313

Open editorialbot opened 1 month ago

editorialbot commented 1 month ago

Submitting author: !--author-handle-->@MicroPhen<!--end-author-handle-- (Stephan Noack) Repository: https://github.com/JuBiotech/peak-performance/ Branch with paper.md (empty if default branch): main Version: v0.7.0 Editor: !--editor-->@csoneson<!--end-editor-- Reviewers: @Adafede, @lazear Archive: Pending

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/e7df0125519d8dc31d303d73f4f5e590"><img src="https://joss.theoj.org/papers/e7df0125519d8dc31d303d73f4f5e590/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/e7df0125519d8dc31d303d73f4f5e590/status.svg)](https://joss.theoj.org/papers/e7df0125519d8dc31d303d73f4f5e590)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@Adafede & @lazear, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @csoneson know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

πŸ“ Checklist for @Adafede

πŸ“ Checklist for @lazear

editorialbot commented 1 month ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 1 month ago

Software report:

github.com/AlDanial/cloc v 1.90  T=0.27 s (183.3 files/s, 477312.9 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
SVG                             10              1            128         108888
Python                          10            366           1162           2309
Markdown                         7            185              0            747
YAML                             9             25             30            224
TeX                              1             16              0            197
Jupyter Notebook                 5              0          12948            193
TOML                             1             10              1             49
DOS Batch                        1              8              1             26
reStructuredText                 4             25             32             22
make                             1              4              7              9
-------------------------------------------------------------------------------
SUM:                            49            640          14309         112664
-------------------------------------------------------------------------------

Commit count by author:

   343  j.niesser
    35  Michael Osthege
    32  Jochen Nießer
    32  Osthege, Michael
     5  dependabot[bot]
editorialbot commented 1 month ago

Paper file info:

πŸ“„ Wordcount for paper.md is 1955

βœ… The paper includes a Statement of need section

editorialbot commented 1 month ago

License info:

🟑 License found: GNU Affero General Public License v3.0 (Check here for OSI approval)

editorialbot commented 1 month ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

csoneson commented 1 month ago

πŸ‘‹πŸΌ @MicroPhen, @Adafede, @lazear - this is the review thread for the submission. All of our communications will happen here from now on.

As a reviewer, the first step is to create a checklist for your review by entering

@editorialbot generate my checklist

as the top of a new comment in this thread. These checklists contain the JOSS requirements. As you go over the submission, please check any items that you feel have been satisfied. The first comment in this thread also contains links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues directly in the software repository. If you do so, please mention this thread so that a link is created (and I can keep an eye on what is happening). Please also feel free to comment and ask questions in this thread. It is often easier to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for reviews to be completed within about 2-4 weeks. Please let me know if any of you require some more time. We can also use EditorialBot (our bot) to set automatic reminders if you know you'll be away for a known period of time.

Please feel free to ping me (@csoneson) if you have any questions or concerns. Thanks!

Adafede commented 1 month ago

Review checklist for @Adafede

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Y0dler commented 1 month ago

First of all: thank you very much for agreeing to review our manuscript and program.

I just wanted to inform you that I created a branch to improve and streamline our GitHub repo's landing page a moment ago as it is a little bit outdated and confusing. More importantly, though, here is the link to our documentation which will be placed more prominently on the new landing page. It contains a lot of helpful information which should simplify your review.

lazear commented 1 month ago

Review checklist for @lazear

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

lazear commented 1 month ago

@Y0dler couple comments/suggestions/etc (will update as I go - likely to be in multiple chunks):

lazear commented 1 month ago

@csoneson clarifications requested:

Y0dler commented 1 month ago

Thanks for your comments and kind words, @lazear!

Regarding the references, I think they should be fine as far as I can see. But if you use the "View article proof on GitHub" link from editorialbot, it hides the last page with most of the references at first (until you hit the "More Pages" button at the very bottom). Maybe that was the problem? Otherwise you can also download the current draft from GitHub actions.

I am not super familiar with bayesian peak picking, but I know there are other papers in this area - perhaps consider citing them and very briefly discuss differences/tradeoffs? Definitely a very good point, I think we need to add that and I'll see what I can do given that we're already way past the theoretical limit of 1000 words^^ I'll try to add it as soon as possible but it might take a couple of days.

Is it possible to benchmark your approach vs standard normal models on benchmark data (e.g. synthetic data with known parameters + noise?) We have tested our approach with noisy synthetic data sets of normal-shaped and skew normal-shaped peaks and compared the results with the ground truth. Originally, this was also part of the paper but we moved it to the documentation in order to decrease the word count. You can find it here. I hope, this is what you meant. There is also a comparison with a commercially available vendor software using an experimental data set. To your later point regarding data sharing: this data set is currently not in the repository.

Is this algorithm only suitable for analysis of a single transition in targeted LC/MS data? Could it handle multiple transitions/product ions from the same precursor? Currently, one would have to analyze multiple transitions sequentially so in case of multiple product ions, no matter if they are from the same precursor ion or not, the time series (or extracted ion chromatogram) would have to be supplied separately for each mass trace and one would have to enter the mass traces into the template Excel file so that they are analyzed in one batch run (but one after the other). I think we discussed a possibility to fit multiple peaks in a hierarchical model of sorts but for a more exact answer I would refer this one to @michaelosthege.

What are the performance characteristics of your approach? Is this scalable to integrating 1000s or 100,000s of peaks in a feasible time scale? The performance depends on a number of parameters, e.g. which peak model is used for fitting, which sampler is used (ideally nutpie), whether a signal needs to be sampled again with more tuning samples or whether it is rejected right away etc. The model selection takes a few minutes per mass trace but you only have to do that once for a batch run. The actual peak analysis may take 20 - 30 s for a single peak and maybe 60 - 90 s for a double peak but in the notebooks it appears that most of that time is taken up by model instantiation and the sampling itself only lasts a few seconds. So, in order to scale this up to 1000s of peaks or more, one would need to parallelize the procedure on a computation cluster and/or implement a way to fit multiple peaks without re-instantiating the model every time. The latter is something that I think can be added to the software, the former is probably more on the user side. I suppose there are also ways to parallelize in Python directly but at our institute we chose the route with a computation cluster so this is not something we worked on.

Let me know if the answers were satisfactory and of course if you have more comments or questions.

csoneson commented 1 month ago

Hi @lazear - thanks for your questions

According to the JOSS data sharing policy, "If any original data analysis results are provided within the submitted work, such results have to be entirely reproducible by reviewers, including accessing the data". To me, this does seem to apply here - the figures displayed in the paper (and those comparing to the commercially available software in the documentation) were generated for the purpose of this paper.

We typically leave it to the authors to decide who is eligible for authorship, in line with our authorship policy. Specifically, "Purely financial (such as being named on an award) and organizational (such as general supervision of a research group) contributions are not considered sufficient for co-authorship of JOSS submissions, but active project direction and other forms of non-code contributions are". Perhaps @Y0dler can comment on this for clarification.

Y0dler commented 1 month ago

Hi @csoneson,

regarding data sharing: the repo was updated so that the notebooks to reproduce the data processing etc. have been added to peak-performance\docs\source\notebooks and the raw data to peak-performance\docs\source\notebooks\paper raw data and its sub-directories. In case the repo changes down the road, the raw data can be found on zenodo under version 0.7.1. The data availability statement was changed accordingly.

Regarding the authorship, @MicroPhen did indeed contribute via active project direction. For example, it was his idea to perform the validation via synthetic data, he helped identify a bug with the posterior predictive sampling etc. Also, I have since moved on to a different job, so further developments of the software will be realized by a successor under the guidance of @MicroPhen.

Also, we merged the paper branch into the main branch and created a version 0.7.1 release. This does not contain any changes to the program code, it just adds the paper, raw data and so on to the repo. Also, we changed the instructions for the installation (@lazear, @Adafede) since there were version conflicts with NumPy, PyMC, and numba.

michaelosthege commented 1 month ago

Is this algorithm only suitable for analysis of a single transition in targeted LC/MS data? Could it handle multiple transitions/product ions from the same precursor?

I think we discussed a possibility to fit multiple peaks in a hierarchical model of sorts but for a more exact answer I would refer this one to @michaelosthege.

Yes, using the functions from peak_performance.models it is possible to define a hierarchical PyMC model, that could connect multiple peaks by, for example, sharing the retention time parameter. This would enable a more wholistic quantification of uncertainties and is something we would like to explore in the future, but it was out of scope for the current release of the library.

michaelosthege commented 1 month ago

@editorialbot set main as branch

editorialbot commented 1 month ago

Done! branch is now main

michaelosthege commented 1 month ago

@editorialbot generate pdf

michaelosthege commented 1 month ago

@editorialbot set v0.7.1 as version

editorialbot commented 1 month ago

I'm sorry @michaelosthege, I'm afraid I can't do that. That's something only editors are allowed to do.

editorialbot commented 1 month ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

michaelosthege commented 1 month ago

@editorialbot set 10.5281/zenodo.13925914 as archive

editorialbot commented 1 month ago

I'm sorry @michaelosthege, I'm afraid I can't do that. That's something only editors are allowed to do.

csoneson commented 1 month ago

@michaelosthege - I will update the version and set the archive at the end of the review process, for now we can leave them as they are.

lazear commented 1 month ago

I've tested installation and run the examples from the documentation. Besides a minor versioning issue everything went smoothly. I think all of the checklist items for me are complete, and again, the authors should be commended on the excellent state of the documentation, code quality, and ease of installation and verification. I recommend publication.

My only remaining comments are that it would be nice to briefly mention performance/runtime of the algorithm in the paper as above.

Y0dler commented 1 month ago

Thanks for the review! I'll add your suggested additions regarding the performance of the software and the comparison to other papers applying Bayesian statistic to peak data evaluation to the pull request you opened πŸ‘

csoneson commented 1 month ago

@lazear - thanks a lot for your review!

Adafede commented 1 month ago

Some additional notes:

Y0dler commented 4 weeks ago

@lazear - I just added a sentence addressing the performance of the software as well as a very short comparison to previous literature to your PR. In case you are fine with these changes and sign the CLA, we can go ahead and merge it.

@Adafede - I also introduced changes to our PR with regards to the preparation of *.npy for the program. Instead of the markdown document, it is now a notebook with a little more detailed explanations.

Adafede commented 4 weeks ago

@Y0dler Thank you for all these!

I admit https://github.com/JuBiotech/peak-performance/issues/32 is probably quite hard so maybe @lazear or @csoneson can comment if this seems reasonable?

csoneson commented 3 weeks ago

Regarding https://github.com/JuBiotech/peak-performance/issues/32 - I think providing an explanation and/or some examples of how to generate the required input files from 'standard' files available to the user would indeed help adoption by people who don't have access to the custom data preparation pipeline used by the authors. Perhaps in this case the mzML format can be considered a suitable standardized starting point. Even without an automatic conversion function in place, I would suggest adding some examples (which can then be adapted by the user if needed) that walk through the process of parsing these files and saving the data in a suitable format for further analysis.

lazear commented 2 weeks ago

I agree that It might be nice having an example highlighting mzML usage, but IMO given that this package is a relatively niche library and not necessarily a standalone tool, it seems somewhat out of scope to me to require the authors to develop a full mzML pipeline given that this is largely for processing SRM/single ion data. They take numpy arrays as input, which can be easily obtained from mzML files (pyteomics or pymzML python packages) - presumably anyone using this library will be able to manage loading data into numpy arrays.

Y0dler commented 2 weeks ago

@Adafede, @lazear, @csoneson - I added such an example (see PR 36). It starts from a *.wiff file (Sciex) which I converted to *mzML via ProteoWizard. Subsequently, I expanded the Preparing raw data... example to include a section where the file is opened and an extracted ion chromatogram is parsed via pyteomics and prepared for use with PeakPerformance.

However, I can't upload the mzML file to GitHub since it is huge (158 MB). We'll have to find a different solution for that.

Y0dler commented 2 weeks ago

Okay, a solution was found and the *.mzML file was attached to release 0.7.1. The instructions in the example notebook were updated accordingly, as well.

I think the final open point remains PR 28 which can only be merged once @lazear has signed the CLA. Sorry about that. If you would prefer not to sign it, I can create a new PR, instead, and credit you in it :)

lazear commented 2 weeks ago

Very strange CLA but I think I have signed it correctly. If not, please feel free to create a new PR/whatever. I don't need any credit :)

Y0dler commented 1 week ago

Yeah, it's weird :D even after re-running the CLA job, it didn't work so I went with a new PR instead and closed the old one.

@csoneson I think you can trigger the generation of a new PDF here so we can check if everything compiles correctly (in GitHub Actions it did so I assume there won't be a problem).

csoneson commented 1 week ago

@editorialbot generate pdf

csoneson commented 1 week ago

@Y0dler You can also trigger the pdf generation on your side (with @editorialbot generate pdf)

editorialbot commented 1 week ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Adafede commented 1 week ago

@csoneson Is it expected that editors get their ORCID linked but not the reviewers?

Y0dler commented 1 week ago

@csoneson Ah, I thought only you could do that since it failed earlier when Michael tried it^^ sorry about that.

The PDF looks ready to me, how about you guys @michaelosthege @MicroPhen?

Y0dler commented 1 week ago

@editorialbot generate pdf

editorialbot commented 1 week ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Y0dler commented 1 week ago

Michael found a broken link which was fixed in this version πŸ‘† Aside from that, he approved of it.

csoneson commented 1 week ago

@csoneson Is it expected that editors get their ORCID linked but not the reviewers?

Yes, this is the case currently. It has been discussed in the editorial team, and linking reviewer ORCID would likely be technically possible (if the reviewers have added it in their JOSS reviewer profile), but the development resources are limited and other aspects have had priority.

Adafede commented 1 week ago

πŸ‘πŸΌ

MicroPhen commented 1 week ago

The pdf looks also ready to me. Thank you very much for your great support!

Y0dler commented 1 week ago

@csoneson Does that mean, we can create a new release in order to conclude the review?

csoneson commented 5 days ago

Sorry for the delay. I see that both reviewers have checked all boxes in the checklists, and @lazear recommended publication in a previous comment. @Adafede - are there any outstanding comments on your side?

Once we have the explicit approval from both reviewers, I will also take a quick look through the submission, and then we can move on.

Adafede commented 5 days ago

@csoneson Sorry, did not want to delay things. No remaining outstanding comments, I explicitly approve it! πŸ˜„