lzim / teampsd

Team PSD is using GitHub, R and RMarkdown as part of our free and open science workflow.
GNU General Public License v3.0
15 stars 23 forks source link

wk2_feb_epic: Sim UI SP module readdress data inputs #2457

Closed jamesmrollins closed 2 years ago

jamesmrollins commented 2 years ago

Description

The migration of Team Data to Power BI necessitates the review standardization of variable names and the readdressing of the "Get Direct Constant" functions within the data variables of the SP Module.

jamesmrollins commented 2 years ago

FYI: @lzim @lijenn @mnallajerla

Hi @jeffhoerle - Attached below is an excel spreadsheet that lists the data variables in the SP model. The variable names on this list are consistent with our conventions. There are a few Episode Count issues that will need to be addressed.

ModelParameters.xlsx

Hi @TomRust - When reviewing the SP model to help @jeffhoerle to integrate the new Power BI, I am unable to find any references or uses of the "GMH to PC/PCMHI Wait Time (median)" for example.

image

jamesmrollins commented 2 years ago

From @TomRust regarding Wait Time Episode Counts:

"You’re right: We don’t use the episode counts to validate the wait time data. It doesn’t matter if the median wait time is based on 1 patient’s experience or 100 patients. If we’re just looking at wait times data, then if anyone is waiting to start or transfer between care settings, then we’ll use their wait time in the base case.

…however, in the1 patient scenario, that patient’s whole experience probably would get thrown out (as it wouldn’t meet our “inclusion threshold” for the transfer or start rates). The model would then use a zero wait time for the related stock of waiting patients (and zero initial patients waiting), as the flows along that pathway would also be zero.

Looking at these numbers in the most recent ModelParams file you sent me, though, gives me pause – shouldn’t the episode counts for each duration (rows 16-24) match the episode counts for each wait time? E.g., if we find that patients transferred from GMH to PC 100 times (units = episodes) during the time covered by our data pull, then shouldn’t we find that same number of episode counts for the time those patients had to wait to be transferred from GMH to PC? Shouldn’t D16 = D28? Or, if some patients transferred on the same day, so didn’t count in the wait time estimate, at least D16 > D28? "

jamesmrollins commented 2 years ago

Hi @TomRust I agree, the wait times would get thrown out vicariously if there weren't sufficient episode counts supporting the engagement duration categories. Incidentally, the screen capture above was from a specific team data file that didn't actually populate SP data, that's why it is all zeros.

With respect to your logic regarding the episode counts from engagement duration matching wait time episode counts, that makes sense to me. But I think the main issue is whether we need to task @jeffhoerle with creating that query in Power BI. It sounds like we don't have to.

@lzim - Is there another clinical understanding we need those data to support when the Facilitators go over the Team Data part of the curriculum?

FYI: @lijenn @mnallajerla

jamesmrollins commented 2 years ago

@lzim @mnallajerla @lijenn Met with @jeffhoerle this afternoon and Jeff indicated there are some issues with respect to how we will get the SP queries to run and provide export data to the Sim UI.

Synopsis

  1. Currently, there are around 500 lines of code to execute a single variable data extraction.
  2. There are around 70 variables needed in SP to drive the model.
  3. The size of the code presents two complications - a) the code is so large, the PowerBI code editing function freezes when attempting to modify, and b) the processing capability to run all 500 lines of code through 70 iterations (one for each variable) is not efficient.
  4. There is a code modification that will allow the code to execute once, and call as many variables as we need at once; thereby, providing several orders of magnitude savings in time and processing power.
  5. The code modification could be applied to all modules.
  6. The output of the code presents as a delimited or single cell output. So instead of having the variable name presented as a field header, with the value displayed in a separate field (or cell), the values would be presented in a string with the variable name preceding the value.

Analysis

  1. The single cell output provides no way to format for color, background or other means to make visually appealing (at this point in time). This makes the creation of a user-centered design more difficult.
  2. Vensim would likely be able to accommodate a flat file for data import. However, that would put us back into the realm of providing some guide that relates the relative position of the data element in the stack, to a given data variable in the Sim UI. More investigation on this topic is needed for a fuller analysis of model implications.
  3. The processing power savings of the new query method is awesome!
jeffhoerle commented 2 years ago

Thanks for the succinct explanation @jamesmrollins.

Currently, each existing table of data (e.g. CC tab - only table, MM tab - "Alcohol Use Disorder" table, AGG tab - "Intake" table) is generated by displaying multiple individual MEASURES. (Each measure contains lines of DAX code to calculate a parameter) The average table contains 8 or 9 measures.

Under new approach, each table of data will contain just ONE MEASURE that performs all computations. This approach yields poorer visuals for the users, but performance is massively improved. I estimate that results will compute 8-10x faster.

Example of new report: image

Another important consideration will be the data export for Sim UI. This new approach is not as clean for @jamesmrollins and he will have to determine whether Vensim can consume the export file if exported this way.

image

jamesmrollins commented 2 years ago

Hi @hirenp-waferwire - Please see the discussion thread immediately above. It describes how Power BI exports all data into one cell of a spreadsheet. Is it possible to parse these variable names and variables out into to our current model parameters file format using an API? Let's discuss this in our 1/19 evening meeting.

jamesmrollins commented 2 years ago

@jeffhoerle @lzim @lijenn @mnallajerla -

  1. Importing and parsing the file into the "traditional" ModelParameters.xlsx file and then exporting it to a location is technically feasible.
  2. Importing the single-cell file directly into the Sim UI, converting it to the "traditional" format and then displaying in the Facilitator team management screens is something that we need to dig a little deeper into and ensure it is possible. Notwithstanding any discovered problems with server side/client-side access, this should work.
jamesmrollins commented 2 years ago

FYI @lijenn @mnallajerla @lzim

Data export review completed

Hi @jeffhoerle - Reviewed the attached data file (see link below) and all data variables are accounted for!! I have asked for a few naming convention edits on the SP parameters. I need these, so when we concatenate the names from the column header to each variable in the cell, the names will be consistent with the model. The order of some of the variables in the other modules will be off, but it will be easier to make Get Direct Constant address changes to the model, once we have the parsing stage figured out.

reviewed_w_edits_dataexport.xlsx

Next step is parser MVP

I will write up parsing instructions for the data export file and @hirenp-waferwire will begin development on 1 Feb. I will write up a card for this step and link it to this one.

jamesmrollins commented 2 years ago

Information from 1/26 Support WG:

In attendance: @jeffhoerle @jamesmrollins @lijenn @mnallajerla

Parsing Function and Final Data Export-import Routine

  1. Sim UI will develop MVP for parsing function wk1_feb sprint (20 hours).
  2. Sim UI will develop an import function using parse outputs as a single integrated step, wk2_feb sprint (30 hours).
  3. @lijenn asked if we could support user testing starting in mid-February. Provided there are no failures of the MVP, both @jeffhoerle and @jamesmrollins said they could support testing.
  4. @jamesmrollins indicated he would have the data export file scrub completed by Friday, 28 Jan.
  5. @lijenn and @mnallajerla to SP Learner (for Data UI) and Facilitator facing documentation ready by 14 Feb.

RISKS (low/moderate/high)

  1. Parsing function may fail (low)
  2. SP Episode of Care units change from wks to months may affect model performance (moderate)
jamesmrollins commented 2 years ago

Variable names updated - Please update coding for flags and filtered data.

Hi @hirenp-waferwire - Please find the attached crosswalk table for the SP Flag and Data-Filtered variable renaming. I pushed an updated model to DEV this afternoon. This should complete final standardization effort and make us ready for full workstream test.

SP New Variable Name Crosswalk.xlsx

FYI @lijenn @mnallajerla @lzim