Review of "The Reference Model for Disease Progression"

cchoirat commented 10 years ago

Reviewer: Christine Choirat

Center: Institute for Quantitative Social Science

University: Harvard University

Field of interest / expertise: Statistics, Statistical Programming

Country: USA

Article reviewed: The Reference Model for Disease Progression

GENERAL EVALUATION

Quality of the approach:

meets
Quality of the writing:

meets
Quality of the figures/tables:

meets

SPECIFIC EVALUATION
Is the code made publicly available and does the article sufficiently describe how to access it?

yes
Does the article present the problem in an appropriate context?
yes
- explain why the problem is important,
yes
- describe in which situations it arises,
yes
- outline relevant previous work,
yes
- provide background information for non-experts
yes
Is the content of the paper accessible to a computational scientist with no specific knowledge in the given field?

yes
Does the paper describe a well-formulated scientific or technical achievement?

yes
Are the technical and scientific decisions well-motivated and clearly explained?

yes
Are the code examples (if any) sound, clear, and well-written?

yes
Is the paper factual correct?

AFAIK yes
Is the language and grammar of sufficient quality?
yes. Minor typos:
- Replace "among hypothesis of disease progression" with "among hypotheses of disease progression"
- Replace "Never the less" with "Nevertheless"
- Replace "Not only it depends on many factors" with "Not only does it depend on many factors"
- Rephrase "Different risk equations found in the literature and parameters they use"
- Replace "Such a combination of equations can include hypothesis" with "Such a combination of equations can include hypotheses"
- Replace "the modeler can create several hypothesis" with "the modeler can create several hypotheses"
- Replace "call it's state transition" with "call its state transition"
- Replace "the use of the Python languange" with "the use of the Python language"
Are the conclusions justified?

yes
Is prior work properly and fully cited?

yes
Should any part of the article be shortened or expanded? Please explain.

Suggestions:
1. Section "Simulation language" Explain the benefits of creating a simulation language vs Python functions.
2. Section "Population Generator", Table 2. How do you calibrate the Population Generator parameters?
In your view, is the paper fit for publication in the conference proceedings? Please suggest specific improvements and indicate whether you think the article needs a significant rewrite (rather than a minor revision).

yes

mrocklin commented 10 years ago

The paper discusses the construction of a tool to compare a variety of different disease models. It describes the expression of models and populations and briefly touches on their simulation. In general it is a fine paper. It might be improved by showing a more cohesive use within the paper and discussing more about simulation techniques. Additionally, if the paper had been written today it might be appropriate to mention projects like PyMC3, which seem to occupy a somewhat similar space.

Answers to all requested questions are generally affirmative, at least to my eye.

Jacob-Barhak commented 10 years ago

Thanks to both reviewers Mathew and Christine for their insights.

I followed the correction list of Christine Choirat who was very detailed. You can find a new version of the paper uploaded. Please find an easy way to include links to this discussion and reviews with the paper itself.

Christine made two suggestions: I. Explain the benefits of creating a simulation language vs Python functions.

I did not want to go into too much detail. So I added the following two sentences for the paper:

"The simulation language presented here constrains the user from a certain perspective. However, it channels the flow of data in a structured way through the system. This constraining is an advantage for the task at hand since it allows providing proper feedback to the user within the disease modeling domain."

However, the answer is much more elaborate. The availability of a Domain Specific Language (DSL) in this case allows for defining parameters, analyzing the population generation order to allow the user the freedom of defining population dependencies without writing them is execution order, constructing intelligent reports, and in the past, even estimating parameters for Markov Models. Nevertheless, it comes with constraints. I am sure that someone can create a better DSL with the current tools.

II. How do you calibrate the Population Generator parameters? This is actually a wonderful question and keen observation. Population generation at the time this paper was published was limited to human reasoning and hand manipulation. Also the GPL system used to generate the populations was IEST. The Reference Model was moved to a newer system called MIST that has similar traits and recently was updated to allow generating populations with objectives in mind. You can now say that population generation is calibrated using evolutionary computation conducted by the Inspyred python library. Yet this was not correct when this SciPy paper was published. Here are links to more recent publications that explain this specific point:

J. Barhak, The Reference Model for Disease Progression uses MIST to find data fitness. PyData Silicon Valley 2014 held at Facebook Headquarters: Abstract: http://pydata.org/sv2014/abstracts/#195_
Presentation: http://sites.google.com/site/jacobbarhak/home/PyData_SV_2014_Upload_2014_05_02.pptx

J. Barhak, A. Garrett, Population Generation from Statistics Using Genetic Algorithms with MIST + INSPYRED. MODSIM World 2014, April 15 - 17, Hampton Roads Convention Center in Hampton, VA. Paper: http://sites.google.com/site/jacobbarhak/home/MODSIM2014_MIST_INSPYRED_Paper_Submit_2014_03_10.pdf
Presentation: http://sites.google.com/site/jacobbarhak/home/MODSIM_World_2014_Submit_2014_04_11.pptx

For the sake of keeping the timeline correct, I did not make any changes in the original SciPy paper, yet this response should give interested readers the proper direction.

One clarification regarding the availability of code questions. The Reference Model is not released and in fact it has patent pending elements. Yet the modeling framework is fully available under GPL license - both the legacy IEST and its replacement MIST are free python software. MIST is available in: https://github.com/Jacob-Barhak/MIST

Finally Mathew raised an interesting point regarding PyMC3. At a first glance it looks like an interesting tool. Yet it was not known to me or available when starting development. In fact, there are many other libraries that I could have used that are available today that would improve many aspects of this code. For example pandas for manipulating data, or ast for handling language issues come to mind. The fact that there are newer tools is actually great since it shows that the scientific python community is moving forwards.

scipy-conference / scipy_proceedings_2012

Review of "The Reference Model for Disease Progression" #4

GENERAL EVALUATION

SPECIFIC EVALUATION