sara-chronister / syndrome-definition-evaluation

R code for evaluation of NSSP BioSense ESSENCE syndrome definition results using ESSENCE APIs.
13 stars 5 forks source link

Syndrome Definition Evaluation Toolkit

Background:

The purpose of this tool is to allow ESSENCE users to evaluate the data details (line level) results of one, two, or three syndrome definitions at a time. This project produces several important outputs:

  1. An HTML report that can be shared with others and opened using either Chrome or Firefox browsers. This document contains no identifiable data, only aggregate results, for security purposes.
  2. A csv file for each combination of definitions possible containing a subset of variables important in the manual review process. These files do contain identifiable information and should be handled accordingly.
  3. A csv file containing details about the definition elements (codes and terms) that can be used to evaluate the performance of individual elements in query results. These files do contain identifiable information and should be handled accordingly.

[!NOTE] If you are planning to use this tool for syndrome development, I would recommed starting with the steps laid out in the Syndrome Definition Guidance document prior to starting with this tool. This document was developed by members of the NSSP Syndrome Definition Committee with the purpose of providing a recommended protocol for developing and testing a new syndrome definition. This tool is particularly useful and revelant to the "Refining the Syndrome" section of the document.

Instructions:

1) Download and set up files:

2) In Excel:

3) In RStudio:

[!TIP]

  • For subsequent uses, be sure your user credentials (ESSENCE and local, if necessary) are up to date. Update your credentials in the "SupportCode/2-UserCredentials.R" file as needed.

[!IMPORTANT]

  • Result files with be stored in the Output_ folder corresponding to the Evaluation_#Defs.Rmd you ran. (Example: If you run Evaluation_OneDef.Rmd then result files will be stored in Output_OneDef).

4) Validation Review:

This tool also supports a linelist, consensus manual review process (referred to as Validation Review) to estimate accuracy metrics of syndrome definitions.

[!TIP] Validation Review is set to run by default. To turn off the Validation Review, go to the "Setup" tab of "DefinitionInformationTable.xlsx" and set the value for Column G Row 3 to to FALSE.

Folder File Description
SupportCode R scripts containing custom functions to support loading ESSENCE credentials as well as pulling/cleaning data for reports.
DefinitionInformationTable.xlsx Define evaluation process parameters and supply the syndrome definitions you wish to evaluate.
Evaluation_#Defs.Rmd R markdown report used to launch entire syndrome validation process. Choose the respective .Rmd template based on the number of syndromes you wish to evaluate.
Evaluation_#Defs.html Rendered R markdown report showcasing syndrome syntax, volumes of emergency department visits, and relative overlap between multiple syndromes.
Output_#Defs Filenames reflect syndrome abbrevations Multiple linelist files of ESSENCE DataDetails records based on the syndrome definition(s) (singular or multiple overlap) they fall under.
Output_#Defs/Matched_Elements Filenames reflect syndrome abbreviations Mulitple files of C_BioSense_IDs and a matrix of 0/1 variables indicating the syndrome syntax components that were identified within the respective record.
Output_#Defs/Validation_Review Nested subfolders supporting Validation Review for each syndrome being evaluated
Output_#Defs/Validation_Review/1_Reviewed_Data Reviewer_#_Data.xlsx Contains separate validation review excel files for each reviewer.
Output_#Defs/Validation_Review Validation_Summary.Rmd (1 reviewer only) or Validation_Summary_Pre_Consensus.Rmd (2+ reviewers only) R Markdown report that calculates syndrome accuracy metrics (1 reviewer: final metrics, 2+ reviewers: preliminary metrics). For 2+ reviewers, it also generates Consensus_Data.xlsx.
Output_#Defs/Validation_Review/2_Consensus_Data Consensus_Data.xlsx Linelist file that facilitates consensus review/discussion of record(s) with disagreement between reviewers (records that have Agreement = FALSE. After coming to a consensus decision, the final status of the record is updated in Review_Category_Consensus.
Output_#Defs/Validation_Review Validation_Summary_Post_Consensus.Rmd (2+ reviewers only) R Markdown report that calculates final, consensus syndrome accuracy metrics.

For questions, ideas for improvement/collaboration, or attribution, please reach out to sara.chronister@doh.wa.gov.