Precision Medicine as a control problem: Using simulation and deep reinforcement learning to discover adaptive multi-cytokine treatments for sepsis

https://arxiv.org/abs/1802.10440

Sepsis is a life-threatening condition affecting one million people per year in the US in which dysregulation of the body's own immune system causes damage to its tissues, resulting in a 28 - 50% mortality rate. Clinical trials for sepsis treatment over the last 20 years have failed to produce a single currently FDA approved drug treatment. In this study, we attempt to discover an effective cytokine mediation treatment strategy for sepsis using a previously developed agent-based model that simulates the innate immune response to infection: the Innate Immune Response agent-based model (IIRABM). Previous attempts at reducing mortality with multi-cytokine mediation using the IIRABM have failed to reduce mortality across all patient parameterizations and motivated us to investigate whether adaptive, personalized multi-cytokine mediation can control the trajectory of sepsis and lower patient mortality. We used the IIRABM to compute a treatment policy in which systemic patient measurements are used in a feedback loop to inform future treatment. Using deep reinforcement learning, we identified a policy that achieves 0% mortality on the patient parameterization on which it was trained. More importantly, this policy also achieves 0.8% mortality over 500 randomly selected patient parameterizations with baseline mortalities ranging from 1 - 99% (with an average of 49%) spanning the entire clinically plausible parameter space of the IIRABM. These results suggest that adaptive, personalized multi-cytokine mediation therapy could be a promising approach for treating sepsis. We hope that this work motivates researchers to consider such an approach as part of future clinical trials. To the best of our knowledge, this work is the first to consider adaptive, personalized multi-cytokine mediation therapy for sepsis, and is the first to exploit deep reinforcement learning on a biological simulation.

Despite the considerable developments in the use of machine/deep learning in the drug development process, one area that remains relatively un-augmented is characterizing and discovering what actually needs to be targeted, in what combination, and when it needs to be applied to alter the course of disease. We recognize "disease" as trajectory diverging from a baseline state of health, and the goal of medicine is to alter that trajectory to a healthy state; in essence this is a control problem. Taken one step further, the practice of medicine can be characterized as a "game" where the medical practitioner is playing against the pathophysiology of disease. This perspective led us to think of using deep reinforcement learning to train an AI-agent that could learn how to control a disease trajectory. While there is existing work that examines the optimization of existing therapies (for which there is existing data to train and guide a learning algorithm, and is referenced in the current version of review under Predicting Patient Trajectories), we are interested in identifying boundary conditions for the degree of observations and actions needed to control a sufficiently complex proxy mechanistic simulation of the pathophysiological process, e.g. going beyond existing therapies and training data. Framing the problem in this fashion allows us to leverage the experience of using DRL to train AIs to play games (Atari Pong, Lunar Lander, Go, etc.), but now applied to a proxy simulation model of a pathophysiologic process. We fully recognize that the simulation model is vastly simpler than the real world system, but given some approximation of the underlying quasi-mechanistic dynamics of the target disease process, we believe that this approach can significantly help frame the nature/extent of the control problem/clinically-relevant treatment space. As an initial test case, we applied DRL to train an AI agent to "control" a proxy agent-based model of systemic inflammation (the Innate Immune Response ABM or IIRABM) to cure sepsis. Sepsis is a disease that arises from the body's inflammatory response to severe infection or injury leading to multiple organ failure. It affects ~1 million people in the US each year, has a mortality rate of ~40%, and after over 30 years of research and failed clinical trials, there is currently no drug that targets its pathophysiologic process. Given the lack of any effective mechanistic therapies (so optimization of existing therapies is not possible) we considered this an excellent demonstration system for the potential benefits of DRL. See the attached paper (https://arxiv.org/abs/1802.10440) for details. We completely recognize the inherent limitations of this approach, primarily related to the lack of fidelity between the quasi-mechanistic simulation and the real world system. In reference to the game-playing, model-based use of DRL to train an AI, video games and Go have identifiable rules that directly translate to the learned task, whereas all mechanistic simulation models are subject to appropriate skepticism regarding their fidelity. However, we believe that there is a fundamental scientific need to be able to use and examine these types of quasi-mechanistic simulation models, as they represent the hypothesis testing step that completes the scientific cycle entered into by the hypothesis-generating applications of data analysis/inference provided by ML/DL. In particular, we envision an iterative loop that links the ability of DL generative models to infer causal chains that translate into physically manifest quasi-mechanistic hypotheses, which are in turn instantiated into simulations that are used to generate the synthetic/simulated data needed to overcome the general problem of data sparsity in biomedicine, and potentially train control-forecasting oriented AIs with DRL. We believe that the integration of these deep learning methods represents the path forward to achieving true precision control of disease. ArXiv DRL for Sepsis IIRABM.pdf

greenelab / deep-review

Precision Medicine as a control problem: Using simulation and deep reinforcement learning to discover adaptive multi-cytokine treatments for sepsis #851