scipy-conference / scipy_proceedings_2012

2012 SciPy conference proceedings
Other
3 stars 11 forks source link

Review of "The Reference Model for Disease Progression" #4

Open cchoirat opened 10 years ago

cchoirat commented 10 years ago

Reviewer: Christine Choirat

Center: Institute for Quantitative Social Science

University: Harvard University

Field of interest / expertise: Statistics, Statistical Programming

Country: USA

Article reviewed: The Reference Model for Disease Progression

GENERAL EVALUATION

yes

mrocklin commented 10 years ago

The paper discusses the construction of a tool to compare a variety of different disease models. It describes the expression of models and populations and briefly touches on their simulation. In general it is a fine paper. It might be improved by showing a more cohesive use within the paper and discussing more about simulation techniques. Additionally, if the paper had been written today it might be appropriate to mention projects like PyMC3, which seem to occupy a somewhat similar space.

Answers to all requested questions are generally affirmative, at least to my eye.

Jacob-Barhak commented 10 years ago

Thanks to both reviewers Mathew and Christine for their insights.

I followed the correction list of Christine Choirat who was very detailed. You can find a new version of the paper uploaded. Please find an easy way to include links to this discussion and reviews with the paper itself.

Christine made two suggestions: I. Explain the benefits of creating a simulation language vs Python functions.

I did not want to go into too much detail. So I added the following two sentences for the paper:

"The simulation language presented here constrains the user from a certain perspective. However, it channels the flow of data in a structured way through the system. This constraining is an advantage for the task at hand since it allows providing proper feedback to the user within the disease modeling domain."

However, the answer is much more elaborate. The availability of a Domain Specific Language (DSL) in this case allows for defining parameters, analyzing the population generation order to allow the user the freedom of defining population dependencies without writing them is execution order, constructing intelligent reports, and in the past, even estimating parameters for Markov Models. Nevertheless, it comes with constraints. I am sure that someone can create a better DSL with the current tools.

II. How do you calibrate the Population Generator parameters? This is actually a wonderful question and keen observation. Population generation at the time this paper was published was limited to human reasoning and hand manipulation. Also the GPL system used to generate the populations was IEST. The Reference Model was moved to a newer system called MIST that has similar traits and recently was updated to allow generating populations with objectives in mind. You can now say that population generation is calibrated using evolutionary computation conducted by the Inspyred python library. Yet this was not correct when this SciPy paper was published. Here are links to more recent publications that explain this specific point:

J. Barhak, The Reference Model for Disease Progression uses MIST to find data fitness. PyData Silicon Valley 2014 held at Facebook Headquarters: Abstract: http://pydata.org/sv2014/abstracts/#195_
Presentation: http://sites.google.com/site/jacobbarhak/home/PyData_SV_2014_Upload_2014_05_02.pptx

J. Barhak, A. Garrett, Population Generation from Statistics Using Genetic Algorithms with MIST + INSPYRED. MODSIM World 2014, April 15 - 17, Hampton Roads Convention Center in Hampton, VA. Paper: http://sites.google.com/site/jacobbarhak/home/MODSIM2014_MIST_INSPYRED_Paper_Submit_2014_03_10.pdf
Presentation: http://sites.google.com/site/jacobbarhak/home/MODSIM_World_2014_Submit_2014_04_11.pptx

For the sake of keeping the timeline correct, I did not make any changes in the original SciPy paper, yet this response should give interested readers the proper direction.

One clarification regarding the availability of code questions. The Reference Model is not released and in fact it has patent pending elements. Yet the modeling framework is fully available under GPL license - both the legacy IEST and its replacement MIST are free python software. MIST is available in: https://github.com/Jacob-Barhak/MIST

Finally Mathew raised an interesting point regarding PyMC3. At a first glance it looks like an interesting tool. Yet it was not known to me or available when starting development. In fact, there are many other libraries that I could have used that are available today that would improve many aspects of this code. For example pandas for manipulating data, or ast for handling language issues come to mind. The fact that there are newer tools is actually great since it shows that the scientific python community is moving forwards.