Analytics in Healthcare

pristineliving commented 4 years ago

Introduction

As we have previously discussed, the healthcare delivery space has seen a rapid and unprecedented increase in available health information technology (HIT) applications and their subsequent use by healthcare providers. Much of this growth has been possible through support by government stimulation (e.g. laws and regulations), proliferation and availability of applications, declining costs of healthcare data, and widespread adoption of interoperability goals and data standards. In particular, data acquisition is increasing substantially across many healthcare departments and sectors, including EHRs, CDS systems, imaging, and public health. Health data is also accumulating at rapid rates through social media, e-mail, and mobile app mining.

The potential for exploitation of these large datasets lies within an advanced analysis. Through health analytics, we can discover novel insights towards informed decision making, quality of care improvement, cost reduction, positive influence on health outcomes, and improved public health. While the term 'health analytics' may be more modern, traditional analytics tools have been used in the healthcare space for decades in applications of statistical models, data mining, and clinical decision support.

Now, a range of technologies and tools have been integrated for interoperability and health data processing in ways providers have not yet had access to. These include more advanced databases, EHR systems, data warehouses, web applications, and CDS systems that directly help doctors, caregivers and patients understand their health data. In turn, the health data available through these technologies can be better exploited to make diagnoses, assign treatments, manage health and health expectations, and execute preventative care. At the bottom line, health analytics is about gaining insight into making more informed healthcare decisions.

Existing examples of health analytics can be found throughout the literature and our previous discussions. They include: gathering information on performance from query and reporting tools; merging data into dashboards, data repositories, or individual patient views; forecasting future events or possible outcomes based on models and what-if analyses; and identifying unknown diseases or medical conditions to seek new or alternative treatments and drugs.

Health analytics also helps healthcare providers reach key objectives, such as providing the best possible patient care, avoiding adverse events, integrating new knowledge and protocols into existing practices, and managing patient care effectively and efficiently. Improving patient care, for example, may require the analysis of safety and outcome data to track key performance metrics. Providers may also want to identify new and effective treatment protocols or identify patients for risk of readmission. Avoiding adverse events means being able to facilitate early alerts on adverse episodes, anticipating medication allergies, and identifying populations of patients in different disease categories.

Healthcare analytics can be organized into a four-stage step-wise model:

Descriptive analytics
Predictive analytics
Prescriptive analytics
Discovery analytics

Descriptive analytics use the existing information on past performances to enable classification and categorization of typically structured data. This is the most commonly used and most well-understood level of analytics. It describes data "as is" and does not require complex calculations. Most healthcare organizations begin with descriptive analytics, moving down the model to more complicated tools as necessary. Descriptive analytics use data to understand both past and current healthcare decisions, making informed decisions based on the data available. The models categorize, characterize, aggregate, and classify data, converting it into useful information for understanding clinical decisions, outcomes, and quality. Summaries most often come in the form of charts and reports, such as those illustrating patient hospitalizations or physician performance. Generally, descriptive analytics rely heavily on visualization to convey information. It's most useful for identifying:

Quantities of patients treated in each organization's facilities
Revenue and costs per quarter
Quantities and types of medical conditions treated
Cost-quality ratios
Patient demographics
Patients to target for treatments or trials

Predictive analytics build on the capabilities of descriptive analytics to forecast future events using various models and what-if analyses. This is a slightly more advanced level of analysis that emphasizes the use of information and looks at the past performance to predict future outcomes based on historical or summarized health data. Predictive analytics primarily extrapolate relationships from the existing data to built these forecasts. An example of a predictive analytics application might be predicting the responses of different patient groups to different drug dosages or clinical trial treatments. These tools anticipate risks and find relationships in health data that are not always apparent through descriptive analytics alone. Particularly through data mining, predictive analytic tools can detect hidden patterns across large quantities of data. These tools are most useful for answering questions such as:

Which drugs should be used for a trial?
Which drugs should be replenished in anticipation of an epidemic?
Which patients are most likely to respond to treatment?
If a drug fails, which other drugs are most likely to also fail or succeed?
How to predict health outcomes to improve healthcare delivery?
How to predict clinical needs and allocate the appropriate resources?

Prescriptive analytics address the question of, what should be? These tools come into play when health decisions involve too many choices or alternatives. Using health and medical knowledge in addition to existing data and information, prescriptive analytics can be used in tough clinical situations such as relegating drug treatments, determining treatment alternatives, determining maximum dosages of drugs to maximize outcomes, and identifying alternative surgical options to consider. Prescriptive analytics is primarily used to support personalized and evidence-based medicine.

Discovery analytics is a newer field of healthcare analytics that uses knowledge about knowledge to facilitate novel discoveries in areas such as drug, treatment, and disease. These tools are most often implemented to help healthcare providers, pharmaceutical companies, and researches identify medical unknowns (e.g. diseases or conditions) or seek new discoveries (e.g. treatments or drugs).

Barriers to Successful Deployment

Infrastructure: Having the IT infrastructure environment for big data analytics is a huge investment for organizations, including time investment and financial investment. Training for staff also requires time and money. Incorporating the implementation of analytics into the workflow will also encounter resistance. With analytics becoming a new member of the routine practice, organizations will have to develop new protocols and policies to ensure appropriate data use.

Data quality: Obtaining data with high quality is a challenge, and how to ensure data remains high quality is a question. Models should also fit the properties of the data. Oftentimes models are trained using retrospective data, which may not translate directly when using real-time data. If there are changes or missingness in data, how to process them before feeding into the model is an issue. Models may need to be updated constantly, who will oversee the update and how frequent the update should be will need to be decided.

Data sharing: To harness the value of big data, various kinds of data will need to be integrated as input to models. These data are likely not all from the same institution or organization, and how to encourage data sharing across organizations is a problem. Health information exchange will play a role in data sharing, but it is not enough. Sometimes organizations don’t want to share data because they are competing to secure the same consumers, and sharing data will not benefit their organizations. Policymakers will need to come up with a way to stimulate data sharing and preserve benefits for organizations at the same time.

Consent: People value their data ownership and privacy, and the current system lets individuals have that freehold. However, for big data analytics to work, data needs to be shared and pooled together. Therefore, regulations may need to loosen up a bit on patient consent, since consenting on every action on the data may not be practical and achievable for analytical purposes. The intricate balance between individual privacy and research use to improve public health needs to be considered carefully. Educating people on the benefits of sharing their data may be an important part of getting people on board with the idea.

Privacy: With information leaks occurring so often, it is understandable that people will have concerns about the safety and privacy about their own data. Assuring data safety and protection, especially more sensitive information, is crucial to persuade the public. Thorough deidentification, without the possibility of reidentification, needs to be established and ensured.

Transparency: The use of analytics should not be a black box. The models built will need to be able to give reasons for the predictions. The methods and sources used for developing the model and how the model is incorporated into the current system is also a valuable piece of information that will gain the trust of users. When people understand what is happening behind the scenes, they will be more likely to accept it and trust it. It is important to let the users understand the options and have the freedom to choose what they want to do with the recommendations.

A Glimpse on Application: Natural Language Processing Analytics and Mental Health

Mental health disorders such as depression, anxiety, and bipolar disorder, had drawn the attention of the physicians and informaticians due to their rapid global increase of prevalence in the past century. Depression has now become the main cause of mental disability and has been reported across the continents. The lifetime prevalence of depression ranges from 20% to 25% in women and 7% to 12% in men¹. The symptoms of depression usually include yet are not limited to fatigue with no reason, a decreasing desire for interaction with others, a lack of community involvement, the difficulty of oral expression or brainstorming, a loss of long-term and short-term memory, and a lack of satisfaction and/or passion. The depressive symptoms have huge interference to both personal life quality and community activeness of the patients. The dissection of population-based high throughput patient data has provided researchers with the possibility of locating the risk factors (such as chronic disease, stress, alcohol abuse, job stress, and night shifts) relevant to mental disorders². The informaticians working on unstructured narratives created by mental health clinicians and patients are using the technology of natural language processing to decipher the implicit/non-coded information from clinical and non-clinical notes for possible data mining and analytics to enhance the efficiency of diagnosis and detection. Such a pathway could potentially provide higher precision of the diagnosis to mental health status and possible pipelines of treatments between departments and institutes, for better collaborative solutions.

Natural language processing (NLP) technique dissects the information buried under the unstructured free text and converts the content into tokens or lexicons which could be coded uniformly for human or machine reading. The collection of patients’ long-term electronic health records (EHR) composed of patient information and clinicians’ narrative notes could be translated into interpretable and consistent coding for the development of machine learning algorithms which drive inferential or predictive studies of behavioral risk factors, depression indicators, and model training towards the suicidal ideation, with high validity and reliability³. Another use case for NLP is that the activeness and emotional expression on the social media network are other reflective references for potential detection of behavioral patterns or susceptibility of mental health disorder, besides the hospital records. A consecutive sequence of negative or emotional comments concerning the satisfaction of current living quality on social media (like Facebook, Twitter, or Reddit), along with the interest in suicidal articles, could be a potential indicator that the user is of great risk of unsatisfactory mental health status⁴. After extraction of data set from the online community or the clinicians’ notes, a training set (for modeling the NLP algorithm) and test set are divided for classification or association studies. Better formatted and interpreted patient information, from social media and medical records, could lead to the improvement of the quality of care. On the other hand, early interventions such as the education regarding the awareness of regularized mental health assessment, the advertisement of potential usage of anti-depressants, and referral or recommendation towards regional institutes/professional personnel, are suggested to deliver to the target patients for further disease management.

The physical conditions of the patients, such as chronic disease, job stress, hormone change, and alcohol abuse, can contribute to the onset or progression of depression⁵. For instance, new mothers are exposed to a much higher risk of depression due to the steep change of hormone, as well as family stress⁶. Physical health conditions which are actually risk factors of mental health disorder would need precise classification, under inter-departmental communication of EHR, during the process of decision making. Here, the introduction of computerized Clinical Decision Support (CDS) systems could potentially aid health care providers with more reliable, effective, and accurate access and/or interpretation of clinical information and EHR. CDSs are categorized as any software designed to help health professionals with patient-related health assessment and possible solutions or recommendations for clinicians during decision making and usually consist of three major parts: receiving patient information, integrating knowledge base, and generating supportive recommendations for clinical decision⁷. The utilization of NLP technology could isolate patient information from the free-text of EHR and discharge notes, summarize practical rules of decision making, and compose knowledge base according to the clinicians’ needs. Therefore, the NLP system could enhance the performance of CDS systems via optimization of all three components mentioned above. With the equipment of well-established natural language processing algorithms, the CDS systems for mental health is promising. A faster and more specific process of support for practitioners, psychiatrists, psychologists, social workers, counselors, and psychiatric nurses could be expected when patient information is appropriately extracted and dissected with satisfactory precision. The classical diagnosis of mental health ran in a retrospective manner along with the utilization of the measurements of serum serotonin level (or even the whole-genome sequencing when necessary) and pediatric assessment of the self-scales as golden standards (e.g., Beck Depression Inventory). The retrospective diagnosis of whether or not the patients are bearing mental health disorder is driven accordingly⁸. Cognitive-behavioral therapy (CBT) is used as one means of early intervention/treatment. Yet if it does not help too much, prescribed antidepressants by the psychiatrist are necessary for the stabilization of the progression of the disease. Here, the loop of decision making composed of question asking, hypothesis testing, and treatment seems complete. Until recent attention driven by the increase of suicidal rates, the unawareness of the mental health status had led to a heavy underestimation of the severity and interference of mental health disorder during daily life thus delayed the preliminary diagnosis⁹. Earlier detection of possible depressive traces would help identify the depression patients who have experienced an onset or are of high risk.

To look for depressive traces by mining patient information, researchers mainly obtain the unstructured text from a patient’s EHR notes provided by hospitals, and sometimes recruit participants of experiments from online communities (e.g., Facebook, Twitter, and Reddit). Under the clinical context, NLP systems such as MPLUS, MedLEE, the Geneva System, and MEDSYNDIKATE, have been utilized for analysis towards the dissection of clinicians’ notes¹⁰. The clinicians’ note from the psychiatric department is usually only coding the portion for billing. However, the unstructured portion of the clinicians’ record during the primary visit consists of a lot of depression-related descriptive text such as “I have a headache”, “I have not felt good for a long time”, and “life is useless”. The mining and interpretation of the patient record with high accuracy is demanding for diagnosis of depression for higher precision and better timeliness.

Clinician notes and discharge summaries are usually unstructured narratives of the patients’ record, containing abundant forms of health information with potentially uninterpreted traces of mental health status. In 2015, the Zhou group published their work on how to identify depression diagnosis from random samples of unstructured patient discharge summaries to enhance the list of patient problems, as well as how to predict the likelihood of depression risk from the patients without a structured formal diagnosis. The free-text discharge notes were firstly processed and analyzed with gold standard established by manual review of psychiatrists to extract depression symptoms. Via comparing the output with the NLP/machine learning-classified outcomes, justification of the precision provided by the NLP algorithm could be performed. Depression-related lexicons were extracted by MTERMS system based on listed symptoms by SNOMED-CT and Diagnostic and Statistical Manual of Mental Disorders, 4th edition (DSM-4) to classify the discharge summaries containing medical history, diagnosis, and treatment¹¹. The precision of the NLP classification method was as high as 0.8. At the same time, it revealed that approximately 20% of the analyzed patients had the risk of depression, and the further diagnosis was suggested. The attempt of the Zhou group revealed the possibility for augmentation of formal clinical diagnosis and enhancement of monitoring patients’ depressive traces for clinical suggestions, while a precise NLP system with an appropriate algorithm is utilized.

In another study monitoring the mental health status of patients, deciphering the personal description of daily life reflects how likely the patients will attempt suicide. In 2016, the Baca-Garcia research group designed a text-based mobile intervention targeting a group of 1453 adult survivors from suicidal attempts after the discharge from a hospital in Madrid, Spain¹². The features were the keywords of n-gram consequence of answers to the open-ended question “How are you feeling today?”. The outcome variables were the indicator of suicide ideation based on the participants’ stratified response towards the question “Have you felt that you do not have the will to live?”. The data set was divided randomly in two halves and one half was used to train the model for prediction of participants’ suicidal ideation. An NLP-based algorithm on the platform of Wired Informatics utilizing the clinical Text Analysis Knowledge Extract System (cTAKES) revealed that top 10 tokens associated with the ideation of suicidal attempts included I told, monotony, harassed, Ritalin, race, restrooms, congested/sick, I pronounce, rejects, and we work, with a positive predictive value as high as 0.64, sensitivity as high as 0.57, and specificity as high as 0.62 when the length and complexity of token were properly selected. Predictions from the multivariate logistic regression model using structured covariates for suicidal ideation had a positive predictive value (PPV) of 0.73, the sensitivity of 0.76, and specificity of 0.6211. The classification of highly risky keywords developed by the Baca-Garcia group could be extended to the web-based monitoring of depressive expressions with a relatively low cost yet high reliability. Providing a precise estimation of the patients’ likelihood of suicidal attempts, the trained NLP algorithm can potentially integrate with well-targeted mobile interventions for patients.

The restriction of access into patient EHR has created problems of data collection and analytics due to the potential existence of selection bias and/or the questionable small sample sizes. Data collection and categorization face the difficulty in the optimization of generalizable classification accuracy. Therefore, researchers turn to focus on the extraction of online social media community comments, which is also not easy. They find that the stigmatization towards the context of depression contributes to the difficulty in the proper collection of online comments since the users of social media networks tend to avoid the direct discussion of depression¹³. Studies have shown that the traces of depressive expressions on social media under the discussion of topics not related to depression are available for classification of whether or not the user is suffering from depression¹⁴. The open-ended discussions on online platforms such as Reddit, Facebook, and Twitter provide researchers with insights into how non-clinical unstructured text could predict or detect the onset of depression. In 2013, the Microsoft research group used Twitter data (recorded anonymously) provided by crowdworkers asked to take standardized clinical depression survey and complete the human-intelligence tasks, to analyze the behavioral difference between the depression positive subgroup (n=171) and depression negative subgroup (n=305)¹⁵. The historical tweets of the participants of depression positive subgroup, dating back to their estimated onset, were collected entirely. Only one year of posts ahead of the survey time were collected in the depression negative group. The psycholinguistic resource LIWC was used to determine 22 depression-related linguistic characterizations. At the same time, LIWC was also used to compute the users’ emotional states in dimensions of positive affect and negative affect. The other two emotional states as activation & dominance were computed by the ANEW Lexicon. A Support Vector Machine classifier with a radial-basis function kernel performed the best and was selected for principal component analysis. The precision of the models based on users’ emotional states, and linguistic styles, were 0.642 and 0.683. Therefore, it is suggested the possibility of detection regarding depressive traces prior to the estimated or reported onset.

The choice of one using a complex yet informative NLP system, or a simplified NLP system with minimization of computational resource, bears concerns of tradeoff. In 2014, the Shah research group compared the functionality and efficiency of NCBO Annotator¹⁶ and MedLEE-based REVEAL (Health Fidelity, Palo Alto, California, USA), to find mentions of entities of interest and investigated trade-offs between these two sets of term-mentions¹⁷ Not much process of complexity involving human emotions was involved in the string matching handled by NCBO Annotator in a minimizer perspective, while deeper linguistic learning of the text structure using REVEAL was performed. The comparison of accuracy and efficiency was operated on two data sets, the 2008 i2b2 Obesity Challenge, and nine million unstructured clinical notes from the Stanford Translational Research Integrated Database Environment (STRIDE). The comparison was conducted on the safety profiling of peripheral artery disease treatment (cilostazol), a drug to drug interaction, and positive mentions of diseases. A significant difference of accuracy was revealed from the progress of counting drug mentions, indicating that the NCBO Annotator exhibited a good quality of mapping from strings to concepts with a well-defined dictionary, and performed the best during the textual analysis of 2008 i2b2 Obesity Challenge data set. Regarding the analysis of other data sets, no evidence showed that the REVEAL system provided with better accuracy than NCBO Annotator (performance-wise both are equally accurate). Such an evaluation of the two NLP methods suggested that it is possible to utilize an NLP method with simple dictionary-based annotation of clinical notes on a fairly large scale, maintaining promising accuracy and high efficiency. On the other hand, the comparative output of data sets under linguistic “golden standard” schemes, such as DSM-4 and DSM-5, would provide straightforward insights regarding the relative precision of annotator and NLP algorithm¹⁸. The evaluation of annotator efficiency and schematic features would aid researchers to find the most proper annotating method, depending on the need for data analysis and the characteristics of the patient data. The unification and standardization of how terminologies of disease-related terms possibly could provide consistency of taxonomy while different annotators were utilized. It is equally necessary for the collaboration between research groups, emphasizing the critical need for quality control research homogeneity presents and the reduction of lexicon redundancy. Using one single yet largely informative “dictionary-like” standard during text mining introduces researchers the possibility of coding relevant text tokens into a machine-readable format, with the wide adoption of well-structured EHR and/or participants’ data. Unified Medical Languages System was a widely used standard of knowledge source which was developed by the National Library of Medicine in order to unify the coding of languages thus enhance the interoperability between medical systems in the United States. It is a comprehensive list of biomedical terms for developing computer systems capable of integrating the specialized vocabulary used in biomedicine and health care. The UMLS consists of 2 major components: meta-thesaurus and the semantic network. The Metathesaurus is a very large and multi-lingual vocabulary database that contains information about biomedical and health-related concepts, their various names, and the relationships among them. It is built from the electronic versions of many different thesauri, classifications, code sets, and lists of controlled terms used in patient care, health services billing, public health statistics, indexing and cataloging biomedical literature, and /or basic, clinical, and health services research. It is the key component of the UMLS. Later, the codes and tokens from the Metathesaurus are categorized and classified with the aid of semantic networks for further analysis. In addition to providing unique identifiers for concepts, UMLS also has lists of synonyms and acronyms, maps to various code sets like SNOMEDCT, and identifies relationships between disease-oriented concepts^{19, 20}. One concern of the applications of natural language processing to mental health disorder related health care issues is the knowledge maintenance among distinct NLP platforms. The efficiency of detection heavily relies on the terminology, thresholds, and definitions provided by the publications and clinical reports which integrate with the vocabulary of medical evaluations. The selection and revision of lexicons among different NLP frameworks are essential. Such a concern could be overcome when standards like UMLS demonstrate homogeneity of dictation.

Another concern underneath the utilization of NLP technology was the ethical paradox underlying. If researchers need high precision of the prediction for progression of depressive symptoms or a collaborative inference regarding the trace of the mental health disorders, a large number of clinical data from the providers would be of acute need: the cost of a late or improper diagnosis of a patient with a severe level of mental health disorder is too heavy. Yet with the restriction of accessing clinical data and issues of security and privacy, an ideal case is always hard to achieve. The data quality and sample size cannot be guaranteed if the relevant study is only conducted in one hospital/region. Each hospital has its own system and sharing the EHR with other researchers could expose the data to their competitors thus the entry and publication of clinicians’ notes are highly limited to the affiliated researchers only. When researchers turn the focus to the analysis of social media, the Application Program Interface applied in the corresponding online community (Twitter, Facebook, Reddit) only allow a certain extent of free scraping per day. At the same time, very few longitudinal tracing of one participant group was conducted, given that the depression progression varies from time to time. The mainstream of the NLP methodology is still highly focused on the prediction of depressive behavior based on retrospective or cross-sectional data obtained from clinicians or the participants recruited online with permission of social media records. It is not ethical to directly scrape data online even if the information from users is publicly available. At the same time, the improper or unintended processing and/or dissemination of materials available online might violate user privacy. Therefore, proactive intervention and monitoring based on patient data are in high demand yet the corresponding development remains behind. Last yet not least importantly, the researchers need to handle the selection bias of patient data very carefully. Via pooling large data warehouse with aid from international institutes (e.g., i2b2 data set), the sufficiency and reliability of training data could be massively enhanced. On the other hand, the collaborative construction of new algorithms makes future data processing more reproducible or generalizable, expectedly with greater precision.

In the diagnosis of depression, there exist several data sources generating either subjective human-generated patient report or provider viewpoints presented in an unstructured format. NLP acts as a powerful analytical artifact for the optimization of data extraction and characterization of clinician and non-clinician notes. The technology of NLP can consistently parse for signs of depression, potentially leading to improved diagnosis and treatment. The role of informatics methodologies is to help bridging across the institutional differences, reduction of potential bias or data presentation, training algorithm, evaluation of NLP methods. With the collaborative optimization of NLP algorithms and the construction of large data warehouses, researchers have optimistic expectations of better-defined diagnostic and intervened outputs for depression patients.

References:

(1) Wang, J.; Wu, X.; Lai, W.; Long, E.; Zhang, X.; Li, W.; Zhu, Y.; Chen, C.; Zhong, X.; Liu, Z. et al. Prevalence Of Depression And Depressive Symptoms Among Outpatients: A Systematic Review And Meta-Analysis. BMJ Open 2017, 7, e017173.

(2) Zhou L, Baughman AW, Lei VJ, et al. Identifying patients with depression using Free- text clinical documents. Stud Health Technol Inform 2015;216:629–33.

(3) Lin, C.; Bai, Y.; Liu, C.; Hsiao, M.; Chen, J.; Tsai, S.; Ouyang, W.; Wu, C.; Li, Y. Web-Based Tools Can Be Used Reliably To Detect Patients With Major Depressive Disorder And Subsyndromal Depressive Symptoms. BMC Psychiatry 2007, 7.

(4) Glen Coppersmith, Kim Ngo, Ryan Leary, and Anthony Wood. 2016. Exploratory analysis of social media prior to a suicide attempt. In Proceedings of the Third Workshop on Computational Lingusitics and Clinical Psychology, pages 106–117.

(5) Lamontagne, A.; Keegel, T.; Louie, A.; Ostry, A.; Landsbergis, P. A Systematic Review Of The Job-Stress Intervention Evaluation Literature, 1990–2005. International Journal of Occupational and Environmental Health 2007, 13, 268-280.

(6) De Choudhury, M.; Counts, S.; & Horvitz, E. 2013. Predicting Postpartum Changes in Behavior and Mood via So- cial Media. In Proc. CHI 2013, to appear.

(7)Demner-Fushman, D.; Chapman, W.; McDonald, C. What Can Natural Language Processing Do For Clinical Decision Support?. Journal of Biomedical Informatics 2009, 42, 760-772.

(8) Beck, A.; Steer, R.; Brown, G. Beck Depression Inventory; Pearson: San Antonio, 1996.

(9) Simpson SM, Krishnan LL, Kunik ME, Ruiz P. Racial disparities in diagnosis and treatment of depression: a literature review. Psychiatric Quarterly. 2007;78(1):3-14.

(10) Cimino, J.; Shortliffe, E. Biomedical Informatics; Springer: London, 2014; p. 257.

(11)Zhou L , Baughman AW , Lei VJ , et al . Identifying patients with depression using Free-text clinical documents. Stud Health Technol Inform 2015;216:629–33.

(12) Cook, B.; Progovac, A.; Chen, P.; Mullin, B.; Hou, S.; Baca-Garcia, E. Novel Use Of Natural Language Processing (NLP) To Predict Suicidal Ideation And Psychiatric Symptoms In A Text-Based Mental Health Intervention In Madrid. Computational and Mathematical Methods in Medicine 2016, 2016, 1-8.

(13) Thornicroft, G.; Mehta, N.; Clement, S.; Evans-Lacko, S.; Doherty, M.; Rose, D.; Koschorke, M.; Shidhaye, R.; O'Reilly, C.; Henderson, C. Evidence For Effective Interventions To Reduce Mental-Health-Related Stigma And Discrimination. The Lancet 2016, 387, 1123-1132.

(14) CALVO, R.; MILNE, D.; HUSSAIN, M.; CHRISTENSEN, H. Natural Language Processing In Mental Health Applications Using Non-Clinical Texts. Natural Language Engineering2017, 23, 649-685.

(15) De Choudhury, M., Gamon, M., Counts, S., & Horvitz, E. (2013). Predicting Depression via Social Media. In Proc. ICWSM-13.

(16) Lependu P, Iyer SV, Fairon C, et al. Annotation analysis for testing drug safety signals using unstructured clinical notes. J Biomed Semantics 2012;3(Suppl 1):S5.

(17) Jung, K.; LePendu, P.; Iyer, S.; Bauer-Mehren, A.; Percha, B.; Shah, N. Functional Evaluation Of Out-Of-The-Box Text-Mining Tools For Data-Mining Tasks. Journal of the American Medical Informatics Association 2014.

(18) Mowery D, Park A, Bryan C, Conway M. Towards automatically classifying depressive symptoms from Twitter data for population health. Proceedings of the Workshop on Computational Modeling of People's Opinions, Personality, and Emotions in Social Media; December 12, 2016; Osaka, Japan. 2016. Dec 12, pp. 182–191.

(19) Data Standards, Natural Language Processing, and Healthcare IT - 3M Inside Angle https://www.3mhisinsideangle.com/blog-post/data-standards-natural-language-processing-and-healthcare-it/

(20)UMLS Project https://www.nlm.nih.gov/archive/20130426/mesh/umlsforelis.html

(21)Raghupathi, W., & Raghupathi, V. (2013). An overview of health analyics. J Health Med Informat, 4(132), 2.

(22) Amarasingham, R., Patzer, R. E., Huesch, M., Nguyen, N. Q., & Xie, B. (2014). Implementing electronic health care predictive analytics: considerations and challenges. Health Affairs, 33(7), 1148-1154.

(23) Roski, J., Bo-Linn, G. W., & Andrews, T. A. (2014). Creating value in health care through big data: opportunities and policy implications. Health affairs, 33(7), 1115-1122.

pristineliving commented 4 years ago

Abby: Op-ed Episode for BIME 535

Healthcare delivery has jumped by leaps and bounds in recent years, and especially over the last decade. We've already begun to usher in a new era of Big Data – a world defined by massive amounts of information on everything from consumer goods to traffic channels to, yes, our personal health information, aggregated on a larger scale than we've ever seen before. But with Big Data comes Big Responsibility: how do we harness this wealth of information in a way that's beneficial not only to us, as data scientists, but to our clients, users, and consumers? And then, the deeper question as clinical informaticians: how do we use this data to create better clinical environments for our patients, and collect data based on our improvements to continue making them?

The healthcare industry is unique in many aspects, not least of which is its history of continued development and restructuring as the needs of patients and available technologies to meet them changes. It's not outside all of our lifetimes that we've seen the transition from paper to electronic health records; from family doctors making house-calls to a revolving-door of new providers within a multi-hospital health system. Healthcare is changing in so many tangible ways with every decade, but it can also feel like it's moving at a snail's pace.

Much of that has to do with issues and challenges that are not unique to healthcare: security and privacy concerns; change management; information exchange; improvement through data analytics. While healthcare may tout plenty of particulars, this is by no means the only industry handling vast amounts of highly sensitive data. The military, private finance, and the intelligence community list just a few. And, yet, informaticians and clinicians alike seem to take on a 'reinventing the wheel' approach to challenges that many of these organizations face, such as implementing a new work flow, or facilitating the exchange of sensitive information. These are the places we see healthcare staggering through – the Patient-Centered Medical Homes, which have been in discussion since before even MYCIN, but have yet to be implemented on anywhere close to a large scale. The clinical setting remains adamant that only clinical informaticians can solve its (not mutually exclusive) problems, and thus does not expand its horizons by consulting with data scientists from adjacent industries pumping out novel and innovative solutions to these challenges every year.

We've seen healthcare professionals get caught in the weeds before – to focus on the specific challenge that sits in front of them rather than examining the underlying issue. If we take PCMHs, for example, we must as ourselves, how much have the original goals of such practices advanced from their decades-old inception? How has the vision of what healthcare looks like shifted with the changing political, social, economic, and environmental climates? Only now, in the era of COVID and forced change, are we seeing true innovation and problem-solving in the healthcare space, possibly on a level we've not yet seen before. Perhaps I'm naive in saying it, but I look forward to where these new steps take us next.

pristineliving commented 4 years ago

Tianran: Op-ed Episode for BIME 535

I was born in a small town in China and have witnessed a vast range of health and medical records. When I was young, every hospital printed the paper-based medical record booklets individually with different formats and page alignments. Later in the mid-2000s, each province started to standardize the regional/province-specific medical record booklets. When I traveled from one hospital to another, it took quite a bit of time for the clinicians to figure out what and how to extract the historical medical information. Meanwhile, losing one piece of information is losing permanently. In the late 2010s, China has built a centralized EHR system, especially for infectious diseases. The rare cases of cholera could be noticed by the CDC momentarily after reporting at the clinicians' end. Then the province-level CDC will track down to the hospital, estimate the severity and infectiousness, and strategize for interventions. One common complaint was that the EHR in China was too poorly designed, and the features were not always user-friendly.

Later, I have seen and used EHR systems as a patient and a researcher after I resided in Washington, USA. I was amazed that the EHR has so well-designed user interfaces and abundant functionalities for patient information profiling and EHR-based studies. I believe that "a properly sharpened blade could chop faster than bare hands", while EHR is the blade to be polished from time to time. It's just that we need time and massive collaboration from all healthcare stakeholders to ensure EHR's completeness and precision.

I have also witnessed successful applications with appropriate data integration, like the network of eMERGE. The eMERGE models provide a possibility of high-quality data sharing, data extraction, and data integration. In the last five years, the University of Washington (UW) has provided eMERGE with reliably standardized and harmonized patient data. Researchers used EHR as a cost-effective proxy for randomization-controlled trials with less bias and higher statistical power. Personally, EHR has enormous potential for the improvement of healthcare quality and healthcare outcomes. Mainly, we face two dilemmas: the tradeoff between standardized workflow and the flexibility of EHR, and the granularity of information provided by EHR narratives. An instance of the former difficulty is that the EHR at the Emergency Department (ED) may need to be responding soon with minimized functionalities yet well-rounded patient summary to ensure that information is sufficiently present in minimal time. But for the internal medicine department, we may need more EHR functionalities for chronic disease management. I think dividing working circumstances and developing/filtering specific functionalities accordingly will make sure that EHR is effective and efficient. For the latter concern, healthcare practitioners may receive a uniform and regular training about standardized code for EHR input to make sure that the information contained in EHR is intermediately granular and more accessible for researchers to extract, annotate, and process. On the other hand, allowing for vocal input of EHR will release the burden of overloading EHR inputs and create a smoother pipeline for proofreading of clinicians and precise voice-to-text transformation through natural language processing. Meanwhile, the collaboration among departments can enrich the EHR data and provide a better interpretation of patient profile, as well as a less biased clinical decision. Moreover, public health stakeholders could utilize pooled EHR to promote better interventions for the regional community.

A well-established EHR system and its data exchange with other departments and organizations are the firm and irreplaceable foundation of clinical decision support (CDS), patient-centered medical home (PCMH), and health information exchange (HIE). However, the development of the aforementioned directions requires that EHR has evolved in a satisfactory state. For instance, a CDS intervention would be effective when the healthcare practitioners have a well-designed workflow and a user-friendly EHR interface. A well-rounded PCMH needs the EHR to be communicative between patients/caregivers and primary care providers (PCP), with some extent of patient autonomy. And HIE is only functional/efficient between organizations, which both have well-designed EHR features and competent infrastructures similarly. The disagreement of preferred EHR functionalities in each field and the heterogeneity of EHR platforms added complications to the development of EHR.

Although EHR has shown massive potential for improvement of health outcomes at an individual and a population level, the primary concern of patients and the EHR developers would be the patient privacy and confidentiality. It is recommended that EHR designing and maintenance always keep an eye on the system security and patient information de-identification. A data breach of EHR could be very harmful to the patients. And the untrusty attitude of EHR will discourage patients from participating in PCMH and HIE. Every time we are considering a new EHR-based technology, the core concerns of data quality and confidentiality should always echo.

Old challenges remain incompletely solved, and new problems arise. However, I feel that with the endeavor of all the stakeholders, including policymakers, researchers, industry groups, patients, caregivers, and healthcare professionals, we will improve the healthcare outcome bit by bit.

myang875 commented 4 years ago

Mu: Op-ed Episode for BIME 535

While some people may not concur with this statement, others may readily agree that the health system in US is expensive and inefficient. There are too much history behind the US healthcare system, and many reforms attempted at improving the healthcare system have not been successful. There may be a new hope for this. With the rise and mature of information technology, clinical informatics may facilitate the change of the healthcare system landscape as never seen before.

Clinical informatics is the application of informatics to the clinical practice. A prominent and well known example of clinical informatics is Electronic Health Record (EHR). This is the first step of informatics application: move patient records from paper to computer. This is fundamental to every other beneficial and powerful application that will be brought on by clinical informatics, including error checking, information storage, flexibility of data formats, data portability and their mobile accessibility, and data integration. EHR also comes with two distinguished, accompanying features: Computerized Provider Order Entry (CPOE) and Clinical Decision Support (CDS). CPOE lets the orders from physicians go right to the receiving parties without going through the hands of multiple intermediate medical personnel. It also lets physicians get instant feedback on the order they entered, and make sure they don’t make easily preventable mistakes. Preventing errors is also part of CDS, among other functionalities that includes making suggestions to action items based on the input records. CDS will make recommendations on next steps, such as tests or treatment options; on the other hand, it will alert potential harmful drug-drug interactions or allergies based on the current medication and records.

When EHR are in place, the data can be very useful. Data can be used to analyze costs and quality of care, which may inform organizations of redundant or unnecessary procedures and tests. This may help to change the healthcare system from a fee-for-service care to a value based care that emphasizes patient outcome rather than the number of procedures they take. Data can have another potential to be used for predictions, for example, predictions on how patients may react to a drug, and predictions on patient outcomes.

To obtain the benefits of these clinical informatics, organizations first need to accept these new technologies. Hence, at the heart of implementation of clinical informatics is change management. To promote any change in any organization or workplace is difficult. People have inertia, and like to stick to their usual routine. Transition to any new protocol is a massive process, but with clinical informatics it is an even more widespread undertaking. To adopt any new information technology, organizations must have the software as well as the hardware in place. This requires the organization to have financial circumstances that allow such a large investment. Investment in time and training is also essential, since people are expected to need training on the use of technology and need time to adjust to its use. A large section of clinical informatics requires coordination and cooperation across organizations, which poses yet another challenge.

Even though committing to change is hard, and clinical informatics has its imperfections, but it is still worth a shot. This may be a way to reform the healthcare system, and it may succeed if done right.

pristineliving / Team-Peony-Primer