openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
703 stars 36 forks source link

[REVIEW]: ddtlcm: An R package for overcoming weak separation in Bayesian latent class analysis via tree-regularization #6220

Closed editorialbot closed 1 month ago

editorialbot commented 8 months ago

Submitting author: !--author-handle-->@limengbinggz<!--end-author-handle-- (Mengbing Li) Repository: https://github.com/limengbinggz/ddtlcm Branch with paper.md (empty if default branch): main Version: v0.2.1 Editor: !--editor-->@Nikoleta-v3<!--end-editor-- Reviewers: @jamesuanhoro, @larryshamalama Archive: 10.5281/zenodo.12711232

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/2cffa48f57384769d086069f824fd020"><img src="https://joss.theoj.org/papers/2cffa48f57384769d086069f824fd020/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/2cffa48f57384769d086069f824fd020/status.svg)](https://joss.theoj.org/papers/2cffa48f57384769d086069f824fd020)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@jamesuanhoro & @larryshamalama, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @Nikoleta-v3 know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

πŸ“ Checklist for @larryshamalama

πŸ“ Checklist for @jamesuanhoro

editorialbot commented 8 months ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 8 months ago
Software report:

github.com/AlDanial/cloc v 1.88  T=0.03 s (1033.3 files/s, 170689.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
R                               24            322           1377           1629
XML                              1              0            184           1513
Markdown                         5             96              0            259
TeX                              2              8              0            105
Rmd                              2            120            160            102
YAML                             2             11              6             55
-------------------------------------------------------------------------------
SUM:                            36            557           1727           3663
-------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository
editorialbot commented 8 months ago

Wordcount for paper.md is 1596

editorialbot commented 8 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.18637/jss.v061.i13 is OK
- 10.18637/jss.v042.i10 is OK
- 10.1109/TPAMI.2014.2313115 is OK
- 10.1080/03610918.2012.718840 is OK

MISSING DOIs

- 10.1093/oso/9780198526155.003.0042 may be a valid DOI for title: Density modeling and clustering using Dirichlet diffusion trees

INVALID DOIs

- None
editorialbot commented 8 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Nikoleta-v3 commented 8 months ago

Hey @jamesuanhoro, @larryshamalama this is the review thread for the paper. All of our communications will happen here from now on.

As a reviewer, the first step is to create a checklist for your review by entering

@editorialbot generate my checklist

as the top of a new comment in this thread.

These checklists contain the JOSS requirements βœ… As you go over the submission, please check any items that you feel have been satisfied. The first comment in this thread also contains links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention https://github.com/openjournals/joss-reviews/issues/6220 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for reviews to be completed within about 2-4 weeks. Please let me know if any of you require some more time. We can also use EditorialBot (our bot) to set automatic reminders if you know you'll be away for a known period of time.

Please feel free to ping me (@Nikoleta-v3) if you have any questions/concerns. πŸ˜„ πŸ™‹πŸ»

Nikoleta-v3 commented 8 months ago

@limengbinggz, could you please add the DOI for one of your references? See: https://github.com/openjournals/joss-reviews/issues/6220#issuecomment-1888851682

larryshamalama commented 8 months ago

Review checklist for @larryshamalama

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

larryshamalama commented 6 months ago

Hi @limengbinggz, great software and great method! From one fellow biostats PhD to another, congratulations on your hard work :)

Below are my questions/comments, on top of some other quick things that I am opening as issue(s)/PR(s) in your repo

1. Weak Separation and State of the Field

State of the field: Do the authors describe how this software compares to other commonly-used packages?

It took me a while to understand what is meant by "weak separation", which is the focal point of this work. Perhaps this is because I am not so familiar with Bayesian tree-based methods... I was initially not sure if using other software that you mentioned (e.g. poLCA, BayesLCA, randomLCA) would yield worse results on your simulated data but Figure 3 in your methods paper seems to confirm this. I am thinking that it would be beneficial to clarify this result here since it would motivate further why we should use your package and not the other ones. What do you think?

Minor comment: lines 21, 22: "classes that share proximity to one another in the tree are shrunk towards ancestral classes a priori" Do you think that you can massage this a bit? I'm not sure if I fully understand, but this sentence seems importance since it highlights on a higher level what this method is doing under the food (re: summary bullet point above).

2. 50 burn-in, 100 posterior draws

In your example, you seem to use 50 burn-in draws and 100 posterior draws. Is that sufficient? As a user, how would I know when the MCMC converges with your software? If I am thinking of conventional MCMC-ing, these seem like low numbers, especially that, in your example, you use $N = 496$ which is larger than the total number of draws. In section 5.1 of your methods paper, you seem to use much more draws in the Gibbs sampler (7,000 and 12,000).

3. Singleton node warnings

I am getting many In checkTree(object) : Tree contains singleton nodes. when running your quickstart. I imagine that this means that, for a specific category, there is only one member, which can happen, but just wanted to double check that this can be normal/okay. Can you briefly comment on this?

jamesuanhoro commented 6 months ago

Review checklist for @jamesuanhoro

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Additional notes

Hello @limengbinggz:

Nikoleta-v3 commented 6 months ago

Hey @limengbinggz πŸ‘‹πŸ» did you get a chance to look over the comments/issues that the reviewers raised? πŸ˜„

Nikoleta-v3 commented 5 months ago

πŸ‘‹πŸ» @limengbinggz

limengbinggz commented 5 months ago

πŸ‘‹πŸ» @limengbinggz

Thank for for checking in. We are working on incorporating the reviewers' comments into the revision, and will push to the repo when we are ready. Thanks for waiting.

Nikoleta-v3 commented 5 months ago

Thank you for the update!

limengbinggz commented 5 months ago

@limengbinggz, could you please add the DOI for one of your references? See: #6220 (comment)

Thank you for pointing out. We have added the DOI for the reference. The paper is updated in commit 4f9a157

limengbinggz commented 5 months ago

@editorialbot generate pdf

editorialbot commented 5 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

limengbinggz commented 5 months ago

Hi @limengbinggz, great software and great method! From one fellow biostats PhD to another, congratulations on your hard work :)

Below are my questions/comments, on top of some other quick things that I am opening as issue(s)/PR(s) in your repo

1. Weak Separation and State of the Field

State of the field: Do the authors describe how this software compares to other commonly-used packages?

It took me a while to understand what is meant by "weak separation", which is the focal point of this work. Perhaps this is because I am not so familiar with Bayesian tree-based methods... I was initially not sure if using other software that you mentioned (e.g. poLCA, BayesLCA, randomLCA) would yield worse results on your simulated data but Figure 3 in your methods paper seems to confirm this. I am thinking that it would be beneficial to clarify this result here since it would motivate further why we should use your package and not the other ones. What do you think?

Minor comment: lines 21, 22: "classes that share proximity to one another in the tree are shrunk towards ancestral classes a priori" Do you think that you can massage this a bit? I'm not sure if I fully understand, but this sentence seems importance since it highlights on a higher level what this method is doing under the food (re: summary bullet point above).

2. 50 burn-in, 100 posterior draws

In your example, you seem to use 50 burn-in draws and 100 posterior draws. Is that sufficient? As a user, how would I know when the MCMC converges with your software? If I am thinking of conventional MCMC-ing, these seem like low numbers, especially that, in your example, you use N=496 which is larger than the total number of draws. In section 5.1 of your methods paper, you seem to use much more draws in the Gibbs sampler (7,000 and 12,000).

3. Singleton node warnings

I am getting many In checkTree(object) : Tree contains singleton nodes. when running your quickstart. I imagine that this means that, for a specific category, there is only one member, which can happen, but just wanted to double check that this can be normal/okay. Can you briefly comment on this?

Thank you very much for reviewing the software as well as the methods paper!

  1. Thanks for pointing out the problem in writing. As you suggested, we have added a clarifying sentence to specifically point out why the existing packages fail, by referring to the results in our method paper: "These phenomena have been demonstrated using poLCA and BayesLCA, which are more relevant to our stated problem of weak class separation, in Figures 2 and 3 of @li2023tree."

As for the sentence regarding how classes are shrunk, we have clarified this point by changing the original sentence into "classes that are closer to one another in the binary tree are encouraged to share more similar profiles, and their profiles are shrunk towards their common ancestral classes \textit{a priori}, with the degree of shrinkage varying across pre-specified item groups defined thematically with clinical significance. "

  1. I definitely agree that 100 posterior draws are far from sufficient. However, the sampling algorithm implemented in this package is quite computationally intensive. For the sake of time and for the purpose of brief illustration, we want to demonstrate the functionality of the sampling function ddtlcm_fit. But to your good point that to provide a comprehensive view of a full posterior chain, we have added a pre-saved dataset named "result_diet_1000iters" with 1000 posterior samples, to accompany the package. In the paper, we have added a sentence to clarify this point in the last paragraph of section "Model Fitting": "To have a more comprehensive view of of the results obtained from 1000 posterior draws, we can load data named result_diet_1000iters to perform posterior summaries as described by the steps in the following sections."

  2. Thank you for pointing this out. In fact, the trees we are dealing with in our model all have singleton nodes by design. This warning message originates from the checkPhylo4 function in the phylobase package to perform basic checks on the validity of S4 phylogenetic objects, where phylogenetic trees usually want to avoid singleton nodes. Therefore, this warning shall not be concerning. To clarify this point, we have added the following Note section in README of the package:

limengbinggz commented 5 months ago

Hi @Nikoleta-v3 @larryshamalama @jamesuanhoro πŸ‘‹πŸ» Thank you both for reviewing our submission and the nice comments! We have responded to all issues. We will appreciate it if you could take a look at our responses and let us know if further improvements are needed. Thank you!

limengbinggz commented 4 months ago

@larryshamalama @jamesuanhoro πŸ‘‹πŸ» Have you got a chance to review our responses? πŸ˜„ @Nikoleta-v3

larryshamalama commented 4 months ago

@larryshamalama @jamesuanhoro πŸ‘‹πŸ» Have you got a chance to review our responses? πŸ˜„ @Nikoleta-v3

I'm happy with the responses! Thanks for addressing them :)

limengbinggz commented 4 months ago

@larryshamalama @jamesuanhoro πŸ‘‹πŸ» Have you got a chance to review our responses? πŸ˜„ @Nikoleta-v3

I'm happy with the responses! Thanks for addressing them :)

Thank you for your reply and all the suggestions! Would you mind completing the checklist once you've got time? Thanks! @larryshamalama

jamesuanhoro commented 4 months ago

I'll try to complete my review before Friday. Sorry for the delay, James.

On Mon, Apr 22, 2024, 11:00 Mengbing Li @.***> wrote:

@larryshamalama https://github.com/larryshamalama @jamesuanhoro https://github.com/jamesuanhoro πŸ‘‹πŸ» Have you got a chance to review our responses? πŸ˜„ @Nikoleta-v3 https://github.com/Nikoleta-v3

I'm happy with the responses! Thanks for addressing them :)

Thank you for your reply and all the suggestions! Would you mind completing the checklist once you've got time? Thanks! @larryshamalama https://github.com/larryshamalama

β€” Reply to this email directly, view it on GitHub https://github.com/openjournals/joss-reviews/issues/6220#issuecomment-2070022650, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK3XFTD4RFEJPRNUYTDFXOLY6UX3JAVCNFSM6AAAAABBX5OZZWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZQGAZDENRVGA . You are receiving this because you were mentioned.Message ID: @.***>

larryshamalama commented 4 months ago

@larryshamalama @jamesuanhoro πŸ‘‹πŸ» Have you got a chance to review our responses? πŸ˜„ @Nikoleta-v3

I'm happy with the responses! Thanks for addressing them :)

Thank you for your reply and all the suggestions! Would you mind completing the checklist once you've got time? Thanks! @larryshamalama

done!

limengbinggz commented 4 months ago

@larryshamalama @jamesuanhoro πŸ‘‹πŸ» Have you got a chance to review our responses? πŸ˜„ @Nikoleta-v3

I'm happy with the responses! Thanks for addressing them :)

Thank you for your reply and all the suggestions! Would you mind completing the checklist once you've got time? Thanks! @larryshamalama

done!

Thank you thank you!

jamesuanhoro commented 4 months ago

@larryshamalama @jamesuanhoro πŸ‘‹πŸ» Have you got a chance to review our responses? πŸ˜„ @Nikoleta-v3

Done, happy with the responses to my comments :). And completed my checklist. James.

limengbinggz commented 4 months ago

@larryshamalama @jamesuanhoro πŸ‘‹πŸ» Have you got a chance to review our responses? πŸ˜„ @Nikoleta-v3

Done, happy with the responses to my comments :). And completed my checklist. James.

Thank you so much!!

Nikoleta-v3 commented 4 months ago

Thank you to both reviewers for your time and efforts! @limengbinggz, please give me one week to also have a final look over the submission, and then we can move forward to the next steps!

Nikoleta-v3 commented 3 months ago

@editorialbot generate pdf

editorialbot commented 3 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Nikoleta-v3 commented 3 months ago

Hello @limengbinggz,

I am very sorry for the long silence and the delay in processing your submission. Life has been hectic the past few weeks. Regardless, apologies for the delay!

I had some time to go over it as well. I have a few questions and comments.

Regarding the paper:

Regarding the package:

I can see you have a few tests, but I didn’t find any guidelines on how to run them, which brings me to the next point: documentation. The package is documented, but I believe it can be improved. I could only find the quick start and the workflow example in the vignettes from the project's repository.

A few comments about the documentation:

Bullet points

In the README, you use bullet points in the overview and examples sections. In the overview, you state:

ddtlcm works for:

Only the first bullet point is relevant to the sentence "ddtlcm works for:". So the second bullet point should just be a sentence.

You do this in the examples section too. You point the user to the workflow, but you also state ddtlcm estimates the tree over classes and class profiles simultaneously. I don't believe that this sentence should be a bullet point. If you would like to include a picture to showcase the functionality of the package, maybe consider having a "Gallery" section. See this README for example: https://github.com/marcharper/python-ternary.

Overall

Please include information on how to run the tests of the package. This is useful in case someone wants to contribute to the package. This way, they can test whether their contribution "broke" something. Please correct some issues with the README and reorganize some sections.

Moreover (this is a recommendation), I suggest you work on the documentation of the package a bit more. Include some more vignettes.

limengbinggz commented 3 months ago

Hello @limengbinggz,

I am very sorry for the long silence and the delay in processing your submission. Life has been hectic the past few weeks. Regardless, apologies for the delay!

I had some time to go over it as well. I have a few questions and comments.

Regarding the paper:

  • Could you please fix the typos: line 89, repeated of; line 94, burn-ins.
  • Subsections such as #### Data Loading are hard to notice. You are using a fourth-level heading. Why not use a second-level heading since they are subsections?

Regarding the package:

I can see you have a few tests, but I didn’t find any guidelines on how to run them, which brings me to the next point: documentation. The package is documented, but I believe it can be improved. I could only find the quick start and the workflow example in the vignettes from the project's repository.

A few comments about the documentation:

  • In the README, examples come after the Quickstart. It's common practice to have these sections the other way around.
  • Regarding the ddtlcm-demo, the link should point the user to the CRAN vignette: https://cran.r-project.org/web/packages/ddtlcm/vignettes/ddtlcm-demo.html instead of the markdown file because it's more readable.
  • In the CRAN project, there is also a reference manual, which I believe is very useful for users. This should be mentioned in the README.

Bullet points

In the README, you use bullet points in the overview and examples sections. In the overview, you state:

ddtlcm works for:

  • Multivariate binary responses over pre-specified grouping of items
  • The functions' relations in the package ddtlcm can be visualized by

Only the first bullet point is relevant to the sentence "ddtlcm works for:". So the second bullet point should just be a sentence.

You do this in the examples section too. You point the user to the workflow, but you also state ddtlcm estimates the tree over classes and class profiles simultaneously. I don't believe that this sentence should be a bullet point. If you would like to include a picture to showcase the functionality of the package, maybe consider having a "Gallery" section. See this README for example: https://github.com/marcharper/python-ternary.

Overall

Please include information on how to run the tests of the package. This is useful in case someone wants to contribute to the package. This way, they can test whether their contribution "broke" something. Please correct some issues with the README and reorganize some sections.

Moreover (this is a recommendation), I suggest you work on the documentation of the package a bit more. Include some more vignettes.

Thanks for the detailed comments! I am working on the updates and will get back to you as soon as possible.

crvernon commented 2 months ago

:wave: @Nikoleta-v3 - could you check back in on this one? It looks like you are really close to wrapping it up. Thanks so much!

Nikoleta-v3 commented 2 months ago

Thank you for the nudge @crvernon!

Nikoleta-v3 commented 2 months ago

Hello @limengbinggz πŸ‘‹πŸ» A month ago, I asked you to address some final comments πŸ˜„ Do you have any updates on your progress?

limengbinggz commented 2 months ago

Hello @limengbinggz πŸ‘‹πŸ» A month ago, I asked you to address some final comments πŸ˜„ Do you have any updates on your progress?

Thanks for the nudge! I am still working on it and expecting to finish all comments very soon πŸ˜„

Nikoleta-v3 commented 2 months ago

Thank you for your reply! I will ping you again in a week if I don't hear back from you! πŸ˜ƒ

limengbinggz commented 2 months ago

Thank you for your reply! I will ping you again in a week if I don't hear back from you! πŸ˜ƒ

Thanks for the very detailed suggestions! I have incorporated all comments in the updated repository in the latest commit. Please let me know if I can do anything else.

Nikoleta-v3 commented 2 months ago

Hello @limengbinggz, thank you for your work πŸ˜„

Given the time it took to get back to us, I have to admit that I expected you would have added a few more vignettes to the documentation of the package.

However, I will leave this as a "future work" for you on your to-do list.

For now, could you please:

Nikoleta-v3 commented 2 months ago

@limengbinggz πŸ‘‹πŸ»

limengbinggz commented 1 month ago

Hello @limengbinggz, thank you for your work πŸ˜„

Given the time it took to get back to us, I have to admit that I expected you would have added a few more vignettes to the documentation of the package.

However, I will leave this as a "future work" for you on your to-do list.

For now, could you please:

  • [x] (If necessary) Make a new tagged release of your software and list the version tag of the archived version here. Alternatively, let me know if I should just use release v0.1.2. I have made a new tagged release of the software. The latest release is v0.2.1. The archived release is 0.1.1.
  • [x] Archive the reviewed software in Zenodo or a similar service (e.g., Figshare, an institutional repository).
  • [x] Ensure the archival deposit (e.g., in Zenodo) has the correct metadata. This includes the title (which should match the paper title) and the author list (make sure the list is correct and that people who only made small fixes are not included). You may also add the authors' ORCID IDs.
  • [x] List the DOI of the archived version here. DOI: 10.5281/zenodo.12711232

Sorry about the delay as I was not aware of the comment.

Nikoleta-v3 commented 1 month ago

I appreciate your advice on the vignettes. Could you please give me more specific suggestions about what can be added to the vignettes for improvement? Thank you!

The current vignette is really to accompany your paper. I was thinking of having vignettes for the different functionalities of the package. Are all the functionalities of ddtlcm being captured by the current vignette?

Nikoleta-v3 commented 1 month ago

@editorialbot set as archive 10.5281/zenodo.12711232

editorialbot commented 1 month ago

I'm sorry human, I don't understand that. You can see what commands I support by typing:

@editorialbot commands

Nikoleta-v3 commented 1 month ago

@editorialbot set 10.5281/zenodo.12711232 as archive

editorialbot commented 1 month ago

Done! archive is now 10.5281/zenodo.12711232

Nikoleta-v3 commented 1 month ago

@editorialbot set v0.2.1 as version

editorialbot commented 1 month ago

Done! version is now v0.2.1

Nikoleta-v3 commented 1 month ago

@editorialbot generate pdf

Nikoleta-v3 commented 1 month ago

@editorialbot check references