Closed editorialbot closed 1 month ago
Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.
For a list of things I can do to help you, just type:
@editorialbot commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@editorialbot generate pdf
Software report:
github.com/AlDanial/cloc v 1.90 T=0.12 s (680.9 files/s, 263179.1 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Python 44 903 1525 3951
Jupyter Notebook 16 0 22839 1264
CSS 1 23 17 510
HTML 1 7 110 348
Markdown 7 60 0 196
TeX 2 12 0 172
YAML 5 18 33 121
CSV 1 0 0 109
JavaScript 1 9 2 109
TOML 1 4 0 43
reStructuredText 2 16 52 8
Bourne Shell 2 0 0 4
SVG 1 0 0 1
-------------------------------------------------------------------------------
SUM: 84 1052 24578 6836
-------------------------------------------------------------------------------
Commit count by author:
210 Caleb Ellington
142 Ben Lengerich
52 cnellington
16 wtt102
9 Andrea Rubbi
5 juwayni lucman
3 aaron10l
2 PaulCCCCCCH
1 Jannik Deuschel
Paper file info:
📄 Wordcount for paper.md
is 862
✅ The paper includes a Statement of need
section
License info:
🟡 License found: GNU General Public License v3.0
(Check here for OSI approval)
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.48550/arXiv.2310.07918 is OK
- 10.48550/arXiv.2310.11340 is OK
- 10.1101/2023.12.01.569658 is OK
- 10.1016/j.jbi.2022.104086 is OK
- 10.1609/aaai.v34i03.5693 is OK
MISSING DOIs
- No DOI given, and none found for title: Varying-coefficient models
- No DOI given, and none found for title: Contextual Explanation Networks
- No DOI given, and none found for title: NOTMAD: Estimating Bayesian Networks with Sample-S...
- 10.1101/2020.06.25.20140053 may be a valid DOI for title: Discriminative Subtyping of Lung Cancers from Hist...
- No DOI given, and none found for title: Personalized Survival Prediction with Contextual E...
INVALID DOIs
- None
👋🏼 hi @cnellington @holl- @pescap -- this is the review thread for the paper. All of our communications will happen here from now on.
As a reviewer, the first step is to create a checklist for your review by entering
@editorialbot generate my checklist
as the top of a new comment in this thread.
These checklists contain the JOSS requirements. As you go over the submission, please check any items that you feel have been satisfied. The first comment in this thread also contains links to the JOSS reviewer guidelines.
The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#6469
so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.
We aim for reviews to be completed within about 2-4 weeks. Please let me know if any of you require some more time. We can also use EditorialBot (our bot) to set automatic reminders if you know you'll be away for a known period of time.
Please feel free to ping me (@fabian-s) if you have any questions/concerns.
@cnellington: while we wait for the reviews, please take a look at the missing DOIs above and see what's what.
@editorialbot generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@editorialbot commands
Hello @cnellington, here are the things you can ask me to do:
# List all available commands
@editorialbot commands
# Get a list of all editors's GitHub handles
@editorialbot list editors
# Adds a checklist for the reviewer using this command
@editorialbot generate my checklist
# Set a value for branch
@editorialbot set joss-paper as branch
# Run checks and provide information on the repository and the paper file
@editorialbot check repository
# Check the references of the paper for missing DOIs
@editorialbot check references
# Generates the pdf paper
@editorialbot generate pdf
# Generates a LaTeX preprint file
@editorialbot generate preprint
# Get a link to the complete list of reviewers
@editorialbot list reviewers
@editorialbot check references
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.1016/j.jbi.2022.104086 is OK
- 10.1609/aaai.v34i03.5693 is OK
- 10.48550/arXiv.2310.11340 is OK
- 10.1101/2023.12.01.569658 is OK
- 10.48550/arXiv.2111.01104 is OK
- 10.1111/j.2517-6161.1993.tb01939.x is OK
- 10.48550/arXiv.1705.10301 is OK
- 10.1101/2020.06.25.20140053 is OK
- 10.48550/arXiv.1801.09810 is OK
- 10.48550/arXiv.2310.07918 is OK
MISSING DOIs
- None
INVALID DOIs
- None
@cnellington From the GitHub statistics, Ben Lengerich @blengerich has contributed the most to the library, at least in terms of lines of code. Why isn't he first author? Also Eric Xing is not on the list of contributors on GitHub. Did he actually contribute to the software?
@holl- I authored the majority of our core Python packages under contextualized/, but out codebase also contains many .ipynb files for our documentation, website, and tutorials built with jupyter-book. @blengerich, in addition to making many contributions to our code codebase, also spearheaded our website/documentation efforts, and also did a large amount of standardized style formatting with black
on our codebase. The line count differences are mainly due to the differences in .ipynb and .py file types, a few reorganizations of large notebook files into new directories, and automated formatting of large swathes of the codebase.
Notebook inflation examples:
Style formatting inflation:
Eric P. Xing is my PhD advisor, and Manolis Kellis is @blengerich's postdoc advisor. While they have not made direct contributions to the code, they have been essential in supporting and directing the software development.
@cnellington Okay, that's fine, then. 👍
@cnellington I have some comments for your easy demo notebook, see this issue
@cnellington Looking at your test setup, it seems that baselines/tests.py
is not being run.
The following files have less than 85% coverage. Is there a good reason for this? Otherwise, could you add more tests?
Module | statements | missing | coverage |
---|---|---|---|
contextualized\analysis\effects.py | 206 | 190 | 8% |
contextualized\analysis\embeddings.py | 67 | 59 | 12% |
contextualized\analysis\accuracy_split.py | 39 | 33 | 15% |
contextualized\baselines\networks.py | 178 | 135 | 24% |
contextualized\easy\ContextualizedNetworks.py | 97 | 71 | 27% |
contextualized\analysis\pvals.py | 31 | 22 | 29% |
contextualized\analysis\utils.py | 6 | 3 | 50% |
contextualized\dags\losses.py | 32 | 14 | 56% |
contextualized\easy\ContextualGAM.py | 9 | 4 | 56% |
contextualized\modules.py | 105 | 43 | 59% |
contextualized\dags\graph_utils.py | 102 | 27 | 74% |
contextualized\utils.py | 22 | 5 | 77% |
contextualized\dags\lightning_modules.py | 152 | 27 | 82% |
@cnellington Your paper is missing an overview of the State of the field. In the 'Statement of need', could you add more concrete motivation for using this library and relate it to existing software solutions (incl. appropriate references)?
@cnellington Your paper is missing an overview of the State of the field. In the 'Statement of need', could you add more concrete motivation for using this library and relate it to existing software solutions (incl. appropriate references)?
Would like to strongly reinforce this and point out this missing reference from the stats literature / R package that seems highly relevant in this context --
partykit
https://www.jmlr.org/papers/v16/hothorn15a.html
implements, inter alia, trees and RFs with regression/classification models in each terminal node
@fabian-s @holl- Thanks for the clarification, we previously had a more detailed and formal comparison of related works but cut this after seeing the example "Gala" submission provides only a high-level overview of the field by citing current usage. Most related work to Contextualized can be described by the generic method categories mentioned in our Statement of Need Section 1 (e.g. partykit
and regression trees operate by data partitioning and are very similar in spirit to mixture modeling). I will re-add our formal problem definition and related works sections to clarify. Right now I am traveling, but you can expect an updated manuscript by next week at the latest.
@editorialbot generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@editorialbot check references
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.1016/j.jbi.2022.104086 is OK
- 10.1609/aaai.v34i03.5693 is OK
- 10.48550/arXiv.2310.11340 is OK
- 10.1101/2023.12.01.569658 is OK
- 10.48550/arXiv.2111.01104 is OK
- 10.1111/j.2517-6161.1993.tb01939.x is OK
- 10.48550/arXiv.1705.10301 is OK
- 10.1101/2020.06.25.20140053 is OK
- 10.48550/arXiv.1801.09810 is OK
- 10.48550/arXiv.2310.07918 is OK
- 10.1214/aos/1017939139 is OK
- 10.1016/j.isci.2019.03.021 is OK
- 10.1080/01621459.2021.2000866 is OK
- 10.1093/bioinformatics/btr239 is OK
- 10.1214/09-AOAS308 is OK
- 10.1198/106186008X319331 is OK
MISSING DOIs
- None
INVALID DOIs
- None
@editorialbot generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@editorialbot check references
@editorialbot generate pdf
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.1016/j.jbi.2022.104086 is OK
- 10.1609/aaai.v34i03.5693 is OK
- 10.48550/arXiv.2310.11340 is OK
- 10.1101/2023.12.01.569658 is OK
- 10.48550/arXiv.2111.01104 is OK
- 10.1111/j.2517-6161.1993.tb01939.x is OK
- 10.48550/arXiv.1705.10301 is OK
- 10.1101/2020.06.25.20140053 is OK
- 10.48550/arXiv.1801.09810 is OK
- 10.48550/arXiv.2310.07918 is OK
- 10.1214/aos/1017939139 is OK
- 10.1016/j.isci.2019.03.021 is OK
- 10.1080/01621459.2021.2000866 is OK
- 10.1093/bioinformatics/btr239 is OK
- 10.1214/09-AOAS308 is OK
- 10.1198/106186008X319331 is OK
MISSING DOIs
- No DOI given, and none found for title: partykit: A Modular Toolkit for Recursive Partytio...
INVALID DOIs
- None
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@fabian-s @holl- I have updated the statement of need to include a formal problem definition and an overview of related work. The only missing DOI is for partykit, where the JMLR page did not have a DOI and the ACM DOI was invalid.
We will continue improving test coverage where possible while we wait for your feedback.
thx, statement of need / related work section seems fine to me now.
@cnellington The paper looks good now.
Could you explain the reason for having an easy
module in your API?
This seems to imply that everything else is hard to do. Having an easy start with a library is good but people shouldn't have to learn a different system for more involved tasks.
Assuming you have written an analysis using easy
, is it possible to obtain the same results without using the easy
API (possibly with more code)?
@holl- The easy
and analysis
modules contain our user-facing APIs. The other modules are developer-facing, mainly written with PyTorch Lightning API patterns. The easy
module internalizes training/evaluation of these PyTorch modules and exposes this through an SKLearn-like API wrapper. By definition, anything done in easy
or analysis
can be done with the developer-facing modules, but we do not expect users to know or use the other modules. Among developers, we do make a point to keep our private codebase in line with the PyTorch ecosystem so contextualized
benefits internally from tools produced there.
@cnellington Okay, then how about calling that module model
/ models
instead of easy
?
Also I noticed that some classes have a plural name, e.g. ContextualizedBayesianNetworks
. This is confusing. Does instantiating this class create multiple networks? If so, how about calling it a forest instead?
The documentation does not really explain the parameters, e.g. what are the bootstraps and archetypes? Could you add explanations or link to a page that introduces these terms?
@holl- The
easy
andanalysis
modules contain our user-facing APIs. The other modules are developer-facing, mainly written with PyTorch Lightning API patterns. Theeasy
module internalizes training/evaluation of these PyTorch modules and exposes this through an SKLearn-like API wrapper. By definition, anything done ineasy
oranalysis
can be done with the developer-facing modules, but we do not expect users to know or use the other modules. Among developers, we do make a point to keep our private codebase in line with the PyTorch ecosystem socontextualized
benefits internally from tools produced there.
I had the same question.
Then I read in easy_regression.ipynb:
Note: These contextualized.easy interfaces don't try to use best PyTorch practices (prioritizing easy use over computational efficiency), so it's recommended to use contextualized models for any heavy lifting (see this demo).
It would be beneficial to add this note (or a similar explanation) before in the docs or README.md.
What do you think about that?
@holl- @pescap @fabian-s We've expanded our test coverage, including new tests for basic functionality and performance (e.g. contextualized.analysis.tests.TestPvals, contextualized.easy.tests.TestEasyRegression).
The most recent coverage metrics are below. All the files below 85% coverage can be justified. Embeddings, accuracy_split, and effects are all visualization/reporting utilities and can't be included in unittests, but we do verify their functionality in the model analysis demo. The only other files below 85% are contextualized.dags.graph_utils, which has several options for noise type that don't need to be tested, and contextualized.functions, where we do not test constant-valued functions (i.e. f(x) = c).
Module | statements | missing | excluded | coverage |
---|---|---|---|---|
contextualized/analysis/embeddings.py | 70 | 62 | 0 | 11% |
contextualized/analysis/accuracy_split.py | 39 | 33 | 0 | 15% |
contextualized/analysis/effects.py | 208 | 143 | 0 | 31% |
contextualized/dags/graph_utils.py | 102 | 23 | 0 | 77% |
contextualized/functions.py | 32 | 6 | 0 | 81% |
contextualized/regression/regularizers.py | 19 | 2 | 0 | 89% |
contextualized/analysis/bootstraps.py | 10 | 1 | 0 | 90% |
contextualized/easy/wrappers/SKLearnWrapper.py | 218 | 17 | 0 | 92% |
contextualized/regression/datasets.py | 86 | 6 | 0 | 93% |
contextualized/baselines/networks.py | 171 | 10 | 0 | 94% |
contextualized/easy/ContextualizedRegressor.py | 17 | 1 | 0 | 94% |
contextualized/dags/lightning_modules.py | 152 | 7 | 0 | 95% |
contextualized/easy/ContextualizedNetworks.py | 97 | 5 | 0 | 95% |
contextualized/baselines/tests.py | 36 | 1 | 0 | 97% |
contextualized/regression/lightning_modules.py | 303 | 10 | 0 | 97% |
contextualized/analysis/pvals.py | 51 | 1 | 0 | 98% |
contextualized/analysis/tests.py | 86 | 1 | 0 | 99% |
contextualized/dags/tests.py | 131 | 1 | 0 | 99% |
contextualized/easy/tests.py | 224 | 1 | 0 | 99% |
contextualized/regression/tests.py | 144 | 1 | 0 | 99% |
contextualized/tests.py | 165 | 1 | 0 | 99% |
contextualized/init.py | 7 | 0 | 0 | 100% |
contextualized/analysis/init.py | 5 | 0 | 0 | 100% |
contextualized/baselines/init.py | 1 | 0 | 0 | 100% |
contextualized/dags/init.py | 7 | 0 | 0 | 100% |
contextualized/dags/losses.py | 26 | 0 | 0 | 100% |
contextualized/dags/trainers.py | 6 | 0 | 0 | 100% |
contextualized/easy/ContextualGAM.py | 9 | 0 | 0 | 100% |
contextualized/easy/ContextualizedClassifier.py | 14 | 0 | 0 | 100% |
contextualized/easy/init.py | 4 | 0 | 0 | 100% |
contextualized/easy/wrappers/init.py | 1 | 0 | 0 | 100% |
contextualized/modules.py | 65 | 0 | 0 | 100% |
contextualized/regression/init.py | 10 | 0 | 0 | 100% |
contextualized/regression/losses.py | 7 | 0 | 0 | 100% |
contextualized/regression/metamodels.py | 76 | 0 | 0 | 100% |
contextualized/regression/trainers.py | 19 | 0 | 0 | 100% |
contextualized/utils.py | 21 | 0 | 0 | 100% |
Total | 2639 | 333 | 0 | 87% |
@holl- @pescap You can expect an update to the website/documentation in the next few days to better introduce common use cases and model parameters.
Regarding specific comments about naming conventions and contextualized.easy, we currently have a substantial user base and cannot rename these modules or classes, but we will explain the modules and their functionality better in our initial demos.
This naming convention is a result of our development process over the past few years. Originally, we planned to make the PyTorch native API user-facing, with the SKLearn-style easy module as a complementary tool, but the easy modules got much better reception from our user base. We've consolidated around this direction, so the easy module and related analysis utilities have become our public-facing API but the naming conventions have stuck.
@cnellington Good to see the progress! While I still don't like the naming convention, I see your point and I'll accept the current namespaces if they are sufficiently documented.
@cnellington @holl- @pescap : thanks for these updates and comments, JOSS greatly appreciates your transparency and responsiveness.
@holl- @pescap : please remember to update the checkmarks in your checklist as issues are being solved. if necessary, please briefly clarify which points remain open and why / what still needs to be adressed.
@holl- @pescap Documentation has been completely updated with intro tutorials, parameter descriptions, motivations and use cases, and API reference. https://contextualized.ml/docs/intro.html
I believe this takes care of all outstanding reviewer requests. Please let us know if you have any other comments or questions.
@cnellington Check out my comment on the Easy Demo thread! Otherwise, I'm happy with the changes!
@holl- @pescap Documentation has been completely updated with intro tutorials, parameter descriptions, motivations and use cases, and API reference. https://contextualized.ml/docs/intro.html
I believe this takes care of all outstanding reviewer requests. Please let us know if you have any other comments or questions.
Great! I have no further comments. Thank you!
@fabian-s All my concerns have been addressed. I've completed my checklist and have no further comments. I recommend accepting the paper.
@editorialbot generate pdf
Submitting author: !--author-handle-->@cnellington<!--end-author-handle-- (Caleb Ellington) Repository: https://github.com/cnellington/Contextualized Branch with paper.md (empty if default branch): joss Version: v0.2.8 Editor: !--editor-->@fabian-s<!--end-editor-- Reviewers: @holl-, @pescap Archive: 10.5281/zenodo.11130703
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@holl- & @pescap, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @fabian-s know.
✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨
Checklists
📝 Checklist for @holl-
📝 Checklist for @pescap