openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
721 stars 38 forks source link

[REVIEW]: BART-Survival: A Bayesian machine learning approach to survival analyses in Python #7213

Open editorialbot opened 1 month ago

editorialbot commented 1 month ago

Submitting author: !--author-handle-->@twj8CDC<!--end-author-handle-- (Jacob Tiegs) Repository: https://github.com/CDCgov/BART-Survival Branch with paper.md (empty if default branch): Version: v0.1.1 Editor: !--editor-->@mahfuz05062<!--end-editor-- Reviewers: @turgeonmaxime, @WeakCha Archive: Pending

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/f7d5279cd5936af580134d8657a1e627"><img src="https://joss.theoj.org/papers/f7d5279cd5936af580134d8657a1e627/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/f7d5279cd5936af580134d8657a1e627/status.svg)](https://joss.theoj.org/papers/f7d5279cd5936af580134d8657a1e627)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@turgeonmaxime & @WeakCha, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @mahfuz05062 know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @WeakCha

📝 Checklist for @turgeonmaxime

editorialbot commented 1 month ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 1 month ago

Software report:

github.com/AlDanial/cloc v 1.90  T=0.12 s (603.6 files/s, 482772.9 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
HTML                            11           3755             28          23030
SVG                              3              0              0           2689
CSS                              9            425             87           1761
JavaScript                      13            150            252            945
Python                           9            206            429            884
XML                              1              0              0            718
Markdown                        12            238              0            696
Jupyter Notebook                 5              0          22547            607
TeX                              1             22              0            344
YAML                             2              1              4             27
DOS Batch                        1              8              1             26
TOML                             1              4              0             25
reStructuredText                 5             16             15             14
JSON                             1              0              0              9
make                             1              4              7              9
-------------------------------------------------------------------------------
SUM:                            75           4829          23370          31784
-------------------------------------------------------------------------------

Commit count by author:

   108  twj8CDC
     8  dependabot[bot]
     1  Boris Ning
editorialbot commented 1 month ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

✅ OK DOIs

- 10.1002/bimj.202200178 is OK
- 10.48550/arXiv.2206.03619 is OK
- 10.1214/19-AOS1889 is OK
- 10.1177/0962280217746191 is OK
- 10.1136/bmjopen-2023-077137 is OK
- 10.1038/s41598-020-77220-w is OK
- 10.1177/0962280218822140 is OK
- 10.18637/jss.v097.i01 is OK
- 10.1002/sim.6893 is OK
- 10.3390/stats5030038 is OK
- 10.18637/jss.v097.i01 is OK
- 10.48550/arXiv.1910.02160 is OK
- 10.7717/peerj-cs.1516 is OK
- 10.1136/bmj.317.7156.468 is OK
- 10.1038/sj.bjc.6601119 is OK
- 10.1214/09-AOAS285 is OK
- 10.1111/j.2517-6161.1972.tb00899.x is OK
- 10.1182/blood.V122.21.1728.1728 is OK
- 10.1007/978-3-319-19425-7 is OK
- 10.1214/08-AOAS169 is OK
- 10.1002/sim.6893 is OK
- 10.18637/jss.v097.i01 is OK

🟡 SKIP DOIs

- None

❌ MISSING DOIs

- None

❌ INVALID DOIs

- None
editorialbot commented 1 month ago

Paper file info:

📄 Wordcount for paper.md is 1637

✅ The paper includes a Statement of need section

editorialbot commented 1 month ago

License info:

✅ License found: Apache License 2.0 (Valid open source OSI approved license)

editorialbot commented 1 month ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

mahfuz05062 commented 1 month ago

@turgeonmaxime and @WeakCha - Thank you for agreeing to review this submission.

This is the review thread for the paper. All of our communications will happen here from now on.

As mentioned above, you can use the command @editorialbot generate my checklist to create your review checklist. As you go over the submission, please check any items that you feel have been satisfied.

There are also links to the JOSS reviewer guidelines (https://joss.readthedocs.io/en/latest/reviewer_guidelines.html)

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention https://github.com/openjournals/joss-reviews/issues/7213 so that a link is created to this thread for visibility. Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for reviews to be completed within about 2-4 weeks. Please let me know if you require additional time. We can also use editorialbot (our bot) to set automatic reminders if you know you'll be away for a known period.

Please feel free to ping me (@mahfuz05062) if you have any questions/concerns.

WeakCha commented 1 month ago

Review checklist for @WeakCha

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

WeakCha commented 1 month ago

Thanks for inviting me for reviewing this paper and I like the idea of extending BART from R to Python. I have taken a quick overview of the paper and the software and I have these comments.

For the paper:

  1. In line 64, please remove the parentheses around j = 1, ..., k.
  2. In line 63-68, it would be great to have a small example explaining how to calculate P_{t_j}, just like you did in the later part of this paper.
  3. In line 71-72, I understand that x_i should be the observation i, but please indicate that (and also other similar math notations) explicitly and clearly.
  4. In line 83-85, it would be great to give another example with event status = 0, as people may not easily obtain the sequence for this case. Also, your example in the paper is wrong because the event time 14 is missing.
  5. In line 93, it is good to explain the meaning of the two parameters of BART. Also, the meaning of \Phi should be pointed out explicitly, although people in stats should know that this is the CIF of the normal distribution.
  6. The goal of the inference section looks unclear to me. Again, it would be great to have examples show how to construct a APD dataset, and how to compute marginal difference, marginal risk ratio, etc. with a real dataset. Also, some sentences look confusing to me. For example, what is the meaning of "Here j_{T_max} is the maximum time across all event times."?
  7. Following 6, please indicate the meaning of Ei and E{ij}. They are expectations over which variable, and how to calculate them empirically?
  8. You mentioned credible interval in Bayesian statistics, which is great. But could you explain how to calculate that in your software? Use which function? This could be complemented by a data example, which I also commented below.
  9. A code example for running a simple dataset is missing. I found it in your Github but I think it should also be available in your paper explaining the usages and the results that you get from your software.
  10. Have you tried comparing your BART-survival with the version in R? I did not find evidence in your paper, just wanna ask whether we could verify your implementation.

For the software:

  1. The links to User Guide/Example Notebooks are not working.
  2. You used the word "SV" a lot of times in your example1.ipynb. Does that mean "survival"?

I may give more comments when I have chance to test this software on my PC, but in general, the idea is interesting, although some refinement may be needed.

Please feel free to let me know if you have any questions, or correct me if I am wrong, thanks!

twj8CDC commented 1 month ago

@WeakCha Thank you for the initial round of edits. I will work on updating based on these comments.

As far as point 10. We did do an extensive validation analysis for our method with simulations and real data, in which we compared our algorithm to the R based algorithm. However, I was under the impression that JOSS did not necessarily want this type of analysis in the JOSS papers. We were planning to put together the validation study in a separate paper.

I could provide an example comparing our algorithm and the R algorithm using a simulated dataset if you think that would be sufficient. Or if you have other thoughts?

WeakCha commented 1 month ago

@twj8CDC Thank you for the reply and you mentioned a good point. Regardless of any preference of JOSS I would simply add a few sentences saying something like "based on our evaluation test, our software successfully reproduces the functionalities in the R version". But I have no idea about this kind of preference. @mahfuz05062 Do you have thoughts on this?

If this is not necessarily shown on JOSS, you do not need to modify anything on the paper. But it would be great if you could provide an example of comparison, and put these comparison code in your github example folder. (could be a py file with the compared R code commented out)

Definitely open to other thoughts!

turgeonmaxime commented 1 month ago

Review checklist for @turgeonmaxime

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

mahfuz05062 commented 1 month ago

@WeakCha and @twj8CDC, I don't think an extensive analysis is necessary here but I like the idea mentnioned by @twj8CDC "comparing our algorithm and the R algorithm using a simulated dataset". If you can provide something like this, it should be sufficient.

twj8CDC commented 1 month ago

@mahfuz05062 sounds great, I will add an example of this sort to the paper. @WeakCha additionally, I have addressed most of the other revisions suggested. I will have a new revised paper uploaded in the next day or two. Thanks!

WeakCha commented 1 month ago

Hi @twj8CDC ! Out of curiosity, have you generated the PDF for us to further review?

twj8CDC commented 1 month ago

Hi @WeakCha. Sorry for the delay, but yes I did just update the pdf! Thanks

twj8CDC commented 1 month ago

@editorialbot generate pdf

editorialbot commented 1 month ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

WeakCha commented 3 weeks ago

@mahfuz05062 @twj8CDC Thank you so much for editing your paper, I agree that your paper looks much better! Here is a list of my new comments:

  1. In line 72, typically [] means inclusive while () means exclusive.
  2. In line 139, 8 * 6 = 48 not 36.
  3. In line 294, the link for comparing your algorithm to the R version does not work.

I will take a look at the github repo soon and will get back to you if I have more comments. Thanks a lot for your editing, I think the paper now is much richer and more detailed!

WeakCha commented 3 weeks ago

@twj8CDC Hi, I come back for the GitHub repo! For your Github repo,

  1. The theoretical background for your software paper does not need to be put in your README, as people interested in your software usage typically will spend little time understanding theories.
  2. In the README, in the section User Guide/Example Notebooks, the two links are not working.
  3. Your example notebooks look detailed, and easy to follow. And I like it! But please explain the meaning of the abbreviation "SV", so that readers will not feel confused.
  4. There are many warning messages in your notebooks, and I would recommend removing most of them for better readability. However, it is optional because I know some warning messages are not easy to be removed...
mahfuz05062 commented 2 weeks ago

Hi @twj8CDC, can you make necessary change according to the suggestions from @WeakCha ?

mahfuz05062 commented 2 weeks ago

@turgeonmaxime I haven't seen much activity from your side. Is there anything that I can help with or do you need more time?

twj8CDC commented 2 weeks ago

@mahfuz05062 Yes I will complete these changes over the next couple days! Thanks

turgeonmaxime commented 2 weeks ago

@turgeonmaxime I haven't seen much activity from your side. Is there anything that I can help with or do you need more time?

Sorry @mahfuz05062, I've been busy with work, but I'm actually reading the newest version right now! I should have some comments to share later today.

turgeonmaxime commented 2 weeks ago

@twj8CDC Thanks for your updates to the paper. I think one crucial point that is still lacking from the manuscript is what role BART-Survival fulfills that PyMC-BART doesn't. Based on my reading of the manuscript but also pre-review discussions here on Github, generating the datasets required for the analysis is a non-trivial task. You should write this explicitly in the text. It is the main message.

Similarly, I think you did a good job explaining the need for BART-Survival when a similar R package already exists. And that's also a crucial requirement, so thanks for that.

Beyond that, I have a few more comments, some general and some more targeted:

I'll have a look at the repo itself and report back.

turgeonmaxime commented 2 weeks ago

@twj8CDC One more thing: could you add a short description of the rossi dataset that you used for the demonstration. Just a sentence or two would be helpful (especially since the lifelines documentation doesn't even mention what the "survival" event is).

twj8CDC commented 2 weeks ago

@turgeonmaxime Great! Thank you for the feedback. I will review and update over the next day. Thanks!

turgeonmaxime commented 1 week ago

@twj8CDC I had a look at the repo. I was able to install the package and run the example code without any issue. So from my perspective, this part meets the requirements, and I've updated my checklist above to reflect that.

Again from my perspective, the only thing missing is to address the last comments about the manuscript.

twj8CDC commented 1 week ago

great! Thank you. I have been delayed on completing the final updates to the manuscript, but I will have this completed soon.

WeakCha commented 1 week ago

I agree with @turgeonmaxime, after the manuscript is updated, I will take a look, and if no problems I will recommend acceptance of this paper.

twj8CDC commented 1 week ago

Hello, I have update the paper and repo based on the feedback. Below I have attached the list of revisions completed.

weakcha

[x] In line 64, please remove the parentheses around j = 1, ..., k.

[x] In line 63-68, it would be great to have a small example explaining how to calculate P_{t_j}, just like you did in the later part of this paper.

[x] In line 71-72, I understand that x_i should be the observation i, but please indicate that (and also other similar math notations) explicitly and clearly.

[x] In line 83-85, it would be great to give another example with event status = 0, as people may not easily obtain the sequence for this case. Also, your example in the paper is wrong because the event time 14 is missing.

[x] In line 93, it is good to explain the meaning of the two parameters of BART. Also, the meaning of \Phi should be pointed out explicitly, although people in stats should know that this is the CIF of the normal distribution.

[x] The goal of the inference section looks unclear to me. Again, it would be great to have examples show how to construct a APD dataset, and how to compute marginal difference, marginal risk ratio, etc. with a real dataset. Also, some sentences look confusing to me. For example, what is the meaning of "Here j_{T_max} is the maximum time across all event times."?

[x] Following 6, please indicate the meaning of Ei and E{ij}. They are expectations over which variable, and how to calculate them empirically?

[x] You mentioned credible interval in Bayesian statistics, which is great. But could you explain how to calculate that in your software? Use which function? This could be complemented by a data example, which I also commented below.

[x] A code example for running a simple dataset is missing. I found it in your Github but I think it should also be available in your paper explaining the usages and the results that you get from your software. Have you tried comparing your BART-survival with the version in R? I did not find evidence in your paper, just wanna ask whether we could verify your implementation.

[x] The links to User Guide/Example Notebooks are not working. You used the word "SV" a lot of times in your example1.ipynb. Does that mean "survival"?

[x] I may give more comments when I have chance to test this software on my PC, but in general, the idea is interesting, although some refinement may be needed.

[x] Please feel free to let me know if you have any questions, or correct me if I am wrong, thanks!'

weakcha

[x] In line 72, typically [] means inclusive while () means exclusive.

[x] In line 139, 8 * 6 = 48 not 36.

[x] In line 294, the link for comparing your algorithm to the R version does not work.

[x] The theoretical background for your software paper does not need to be put in your README, as people interested in your software usage typically will spend little time understanding theories.

[x] In the README, in the section User Guide/Example Notebooks, the two links are not working.

[x] Your example notebooks look detailed, and easy to follow. And I like it! But please explain the meaning of the abbreviation "SV", so that readers will not feel confused.

[x] There are many warning messages in your notebooks, and I would recommend removing most of them for better readability. However, it is optional because I know some warning messages are not easy to be removed...

turgeonmaxime

[x] Thanks for your updates to the paper. I think one crucial point that is still lacking from the manuscript is what role BART-Survival fulfills that PyMC-BART doesn't. Based on my reading of the manuscript but also pre-review discussions here on Github, generating the datasets required for the analysis is a non-trivial task. You should write this explicitly in the text. It is the main message.

[x] In general, it is recommended to use punctuation with equations, as they are generally part of a sentence. For example, on line 81, the equation should end with a comma, and the equation on line 82 should end with a period. Moreover, if you add extra context after an equation (e.g. line 185), the equation above should end with a comma and the sentence that follows should start with a small letter (i.e. "where" instead of "Where").

[x] Related to the comment above, it would be awkward to try and do the same with the tables that are part of your sentences (e.g. lines 69 and 71), so I think it's fine to drop the punctation in those cases. I would therefore recommend rephrasing line 72 so that you get a complete sentence. For example, "In the table above, the intervals are..."

[x] I'm not sure why "Survival" appears multiple times with a capital S. Please review and fix as needed.

[x] What you describe as the "simple setting" in the "Background" subsection is essentially the Kaplan-Meier non-parametric estimator. I would suggest you add that clarification and maybe also a reference to your favourite textbook on survival analysis.

[x] L124: More generally, the dataset TAD can be used for binary regression. Probit regression is just one type of binary regression.

[x] L147: PAD length should be 48 (which was also mentioned by @WeakCha).

[x] L161: Typo, "the of the probability"

[x] L201: Missing a period at the end of the sentence.

[x] L235: The sentence should be reviewed, I think it should be "These expectations can be further used to make comparisons can be made between".

[x] L265: what are the weights and coords parameters?

[x] L268: I think it should be a small s in "The p,S arrays..."

[x] L294: There is a typo in the URL, hence why the link doesn't work.

[x] could you add a short description of the rossi dataset that you used for the demonstration. Just a sentence or two would be helpful (especially since the lifelines documentation doesn't even mention what the "survival" event is).

twj8CDC commented 1 week ago

@editorialbot generate pdf

editorialbot commented 1 week ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

twj8CDC commented 1 week ago

The new PDF is generated. Pleas let me know if you have any additional feedback! Otherwise thank you both for reviewing our work! It is very much appreciated!

turgeonmaxime commented 1 week ago

@twj8CDC Thanks, this looks good to me! Although there is still a typo on line 157, it should say "(8 * 6) = 48".

@mahfuz05062 I've completed the review and updated the checklist. I recommend the manuscript be accepted by JOSS (pending the correction to the typo above).

twj8CDC commented 1 week ago

@turgeonmaxime ah thanks haha I just fixed it!

twj8CDC commented 1 week ago

@editorialbot generate pdf

editorialbot commented 1 week ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

WeakCha commented 5 days ago

@twj8CDC Hi! I have done an another review and have 1 minor comment: From line 236-266, I am wondering what your notations x{[I]}, x{[I]1}, x{[I]_2} mean. I am asking this because I did not find explanations on these notations and also no corresponding notations in your example. Could you please organize these notations (and other notations if inconsistency exists) to reduce confusion?

twj8CDC commented 5 days ago

@editorialbot generate pdf

editorialbot commented 5 days ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

twj8CDC commented 5 days ago

@WeakCha Hi, thank you for the feedback. The bracketed subscript is meant to identify the covariate selected to be deterministically set in each PDAD. I adjusted this section to be more explicit regarding what this notation means. Let me know if it is now more clear! Thanks

WeakCha commented 5 days ago

@editorialbot generate pdf

editorialbot commented 5 days ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

WeakCha commented 5 days ago

@twj8CDC Thanks! @mahfuz05062 I also recommend acceptance of this paper!

twj8CDC commented 4 days ago

@mahfuz05062. Are there any final tasks required from me to complete the submission/acceptance process?