lzim / teampsd

Team PSD is using GitHub, R and RMarkdown as part of our free and open science workflow.
GNU General Public License v3.0
9 stars 23 forks source link

sep_epic story5: PowerBI or BUST! #1961

Closed jamesmrollins closed 3 years ago

jamesmrollins commented 3 years ago

Develop the ability for the admin user to:

jamesmrollins commented 3 years ago

7/14 - @lzim @staceypark @lijenn @jamesmrollins - Reviewed various ways to look at the results information, with different qualities (such a time series decomposition and regression analysis). This is not completely decided. To provide a complete story, we may need to include a variety of charts in the tt report. There are no apparent technical limitations to this potential development task.

Decided:

jamesmrollins commented 3 years ago

@lzim @staceypark @lijenn @anazariz @hirenp-waferwire

7/20 Report of conversation with @dkngenda regarding tt3 reports.

Summary

Alternative 1

  1. User downloads CDW produced raw data for a given team.
  2. The user drags and drops the file into the Sim UI dashboard using the TT setup function.
  3. The Sim UI, using a web-hook grabs the data and sends it to GitHub using the GitHub API.
  4. The GitHub API delivers data to a given repo, which triggers an R-code processing action.
  5. The processing action, opens a R-base docker and executes previously validated r-code to produce textured* data in terms of time series decom, LOESS and P-chart.
  6. The GitHub API then moves the data from the repo to the Sim UI, which received the data and renders the appropriate chart in the tt3 report.

Alternative 2

  1. User downloads CDW produced raw data for a given team.
  2. The user drags and drops the file into the Sim UI dashboard using the TT setup function.
  3. The data is uploaded into the Sim Ui Data API.
  4. R-code is executed using the r-studio API, resulting in textured* data.
  5. Textured* data is rendered in appropriate charts in html.

*Note: Textured data means data that has been processed through an R-code function resulting in desired statistical features.

lzim commented 3 years ago

@jamesmrollins & @dkngenda @staceypark @lijenn @anazariz @hirenp-waferwire

BLUF: As outlined, Due to concurrent demands, I do not believe this will be possible in time for the delivery of the NIH tt3 reports due in ~ 6 weeks via the Sim UI as described above. In fact, tt3 visualizations are actually a more minor deliverable relative to our larger priorities and capacity constraints between now and 9/1.

When we meet today at 3:30PM during the Sim UI workgroup we will divvy up tasks necessary to synthesize our tt3 visualization goals within these larger critical Team PSD needs, which is actually the straightest line to delivering what is needed for tt3 and migration to Microsoft 365 & Power BI as planned since 2019 given i, ii, iii, iv, v): i) compliance - who governs our existing resources and how we would achieve these objectives within those requirements ii) definitional - how the definitions vary by sta3n, sta6a, team iii) technical - how the CDW is accessed with what codebase (SQL for facility vs. SQL/VBA for data UI vs. R) through which server (facility a01 vs. FRE SQL33 vs ORD research workgroups), by whom (Team PSD member vs. other OMHSP partners vs. MTL Facilitator vs. _MTL Learner) and why in the MTL program. iii) user permissioning - given who uses which Team PSD and workflows to do what, and who permissions these workflows, these are described in Team PSD documentation & MTL documentation. iv) documentation - as we have seen, despite our team focus in the Jan 2021 6 month roadmap on story 2 (perm tables), story 3 (CDW documentation, & story 5 (planned Power BI migration, settled for required Sharepoint Online migration), we currently have not achieved our Team PSD objects for the document_teams column of the documentent_tracker and retain the risks of lack of redundancy in the Team PSD members who can assist with synthesis of existing resources address planned migration to the Microsoft 365 ecosystem & updated tt3 visualizations.

NEXT STEPS: Integrate tt3 visualizations in the deliverables of the Power BI Story 13 Go Live on 09/01/2021

Dependencies: a) We have a backlog of both facility & data UI bugs/tasks on Sharepoint Online that prevent propagation and readiness to support our next VA wave (described below). b) To understand this synthesis as "straightest line" to achieve multiple goals for the team effort expended requires fluency with the underlying i) compliance, ii) definitional, iii) technical, iv) user & permissioning, v) documentation and as observed on Monday we have not achieved this team-wide fluency (yet) We will 😅

Clarifications for translating among Team PSD members: a) tt3 visualizations will need to occur via Power BI reports run on our PTSD_OMHO workgroup via the A01 CDWWork Production server (Note per above for CDW fluency: When communicating with VHA CDW/BISL/Power BI Staff, it will be clearer not to use "raw" to describe our use of CDW data, as "CDW raw data" has a specific meaning and is not what we use, we use "CDW Production data," hosted on the main VHA Operations Server known as "a01" via our CDW workgroup "PTSD_OMHO."

b) Our R21 R Notebook codebase for these visualizations are at the level of the sta6a clinic (consistent w/our analyses proposed in tests of specific aims, and not at our level of quality improvement intervention, which is the specific team. This distinction is important - see "c" below.

c) As we also reviewed together through much of Monday 7/19 extended 8AM Workgroup Leads meeting, Team data tables used in Modeling to Learn are not primarily produced via our PTSD_OMHO workgroup rather MTL capitalizes on existing infrastructure for "row-level permissioning" via LSV servers on the Field Reporting Enclave (FRE SQL33), which enables VHA frontline providers (aka MTL "learners") to access data for their local VHA sta3n facility.

Summary

Shared Team PSD Fluency needed for Story 13 - "Power BI or Bust" 🤣

At 3:30PM Sim UI meeting today - As the most fluent Team PSD members, @lzim @staceypark & @lijenn will answer questions about these details, documentation and tasks and then move to divvy up responsibilities for Story 13 "Power BI or Bust

jamesmrollins commented 3 years ago

I know this is mostly a regurgitation of the above, but represents my understanding of the problem and related tasks. It also has my questions.

FYI: @anazariz @dkngenda

Facts

Specified Tasks

Implied Tasks

Questions

  1. I am at your service - how can I help?
  2. Will we want to display the tt3 PowerBI visualizations in the Sim UI and print the related output?
  3. Do we want to move any of the other tt report elements to PowerBI?
  4. In the past, @lzim mentioned that she wanted PowerBI to look like the Data UI spreadsheet as much as possible - should that be a part of our planning assumptions?
  5. I read from issues #1915 - #1939 that we have validated necessary PowerBI licenses, SharePoint on-premises has been successfully migrated to SharePoint 365 and we have inherited the necessary ATO. Have we done any PowerBI test reports or visualizations?
lzim commented 3 years ago

Thanks @jamesmrollins I really appreciate your quick work on this 🐇

Please see my notes, which are intended to "nip in the bud" 🌹 any potential for misunderstanding.

FYI: @anazariz @dkngenda @lijenn @staceypark

Facts

MTL Users, facilitators, QIICs and Systems SMEs have access to CDW.

Yes, but to develop using Power BI app, it important to understand that CDW access is different for different user groups: MTL Learners (access CDW via LSV FRE SQL133 only), other users via A01 PTSD_OMHO.

Because of shift to Power BI priority, we will not repurpose the research codebase for operational quality reporting purposes.

Potentially misleading, would clarify: We will not repurpose R code via GitHub to produce visualizations in the Sim UI. But, to achieve migration to Power BI, any work described in Story 2 (perm tables, rather than temp), Story 3 (CDW documentation) and Story 5 (Power BI/Sharepoint Online migration) SHOULD be.

Team Data is mostly generated from the Field Reporting Enclave (FRE SQL33), which is at the sta3n facility level.

Would clarify - Team Data is generated by teams via the DataToolFRE23_nevermore_online.xlsm, which queries national CDW data enlisting the existing permissioning (not managed by Team PSD) of CDW via FRE SQL 33 to frontline staff for access only to their own facility data (sta3n) - from which they produce a narrower team dataset using the ClinicSelection & SPReferrals tabs of DataToolFRE23_nevermore_online.xlsm. When MTL facilitators use DataToolFRE23_nevermoreonline.xlsm to generate Team Data, it runs the same SQL queries_, but enlists different permissions & servers that I and VHA National Data Services grant access to via our CDW a01 workgroup named PTSD_OMHO.

Jen and Stacey manage permissions for MTL learners to access this information.

hq manages access to team folders on Sharepoint Online that contain a facility (sta3n-specific) version of DataToolFRE23_nevermore_online.xlsm, but we do not the permission to access FRE SQL 33 (see above).

The team data queries the FRE SQL33 data first, and finding none, escalates to PTSD_OMHO for data fulfillment.

Only for Team PSD & MTL facilitators, not learners, who do not have national access. _Team PSD & MTL facilitators must use PTSDOMHO to support a team from another facility other than their own home facility.

Based on user selections, queries must be able to reach from the team level to the clinic level in some use cases (SPReferrals).

For specificity, would clarify SPReferrals often track sta3n ("facility") referrals, not just sta6a ("clinic") referrals.

tt3 visualizations will occur via Power BI reports run on PTSD_OMHO via A012 CDWWork Production server.

More complex:

Questions

I am at your service - how can I help?

  1. In consultation w/CDW/BISL/Power BI, help Team PSD to develop Power BI ClinicSelection and SPReferrals tab MVP for building teams ASAP within the server and governance structure that exists in VA and consistent w/how MTL resources are used by our 3 primary users - admin, facilitators, teams.
  2. Integrate existing VBA/SQL codebase into SQL and Power BI Data Model that meet core MTL local team data functionalities outlined above.
  3. Effectively engage with CDW/BISL/Power BI VHA resources on behalf of Team PSD to answer our questions with their online trainings and resources. Use them effectively at their office hours to resolve any blockages we encounter while minimizing impact on MTL documentation and training, which can rapidly exceed our capacities. NOTE: In all previous meetings with them, they indicate that all the guidance is generally available to migrate all of these functionalities

Will we want to display the tt3 Powe rBI visualizations in the Sim UI and print the related output?

We will determine this once we are getting any visualizations at the team level in Power BI. 😅

Do we want to move any of the other tt report elements to Power BI?

No.

In the past, @lzim mentioned that she wanted Power BI to look like the Data UI spreadsheet as much as possible - should that be a part of our planning assumptions?

Should it influence our planning = Yes. To be effective, the key is understanding how and why this requirement exists. This planning/design/development requirement exists for four primary reasons that are easy to understand based on our prior development efforts on Team PSD.

  1. The existing resources were developed with years of iterative MTL learner (user) input & deviations must be understood by considering why the resources exist the way they do now from no less than 3 MTL user points of view (POV): learner, facilitator, admin/support workgroup.
  2. The existing resources were developed to empower local team access to data within existing VHA data governance, permissioning infrastructure and OMHSP definitions. We cannot change how VHA governs data and we cannot change that MTL is about supporting teams and gain greater control of use of local team data.
  3. MTL is underway now, growing, and OMHSP facilitators are limited in their ability to re-train much - Our old rules apply, the change should be such an easy-to-understand improvement that our user-base welcomes the improvement 😺

IF WE ALL UNDERSTAND THESE ONGOING Team PSD _PRINCIPLES (no real change here 👍), then it's all about how specifically we can achieve the needed Power BI implementation in practice.

I read from issues wk2 may_epic story_5 task: ATO for PHI/PII #1915 - wk? may_epic task: story_5 SharePoint Online - dev to prod release process #1939 that we have validated necessary PowerBI licenses, SharePoint on-premises has been successfully migrated to SharePoint 365 and we have inherited the necessary ATO. Have we done any PowerBI test reports or visualizations?

dkngenda commented 3 years ago

@lzim @staceypark @lijenn

knowledge check on the following items

Data permissioning and access

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

Database(Source) | Permission granted by | Access granted to -- | -- | -- PTSD_OMHO | Lindsey | TeamPSD members, Cora and Andrew CDW A01 | VHA NDA process | QUICS, SMEs have national access CDW B02 | VINCI via DART process | TeamPSD members LSV FREE SQL33 | LSV permissioning process | MTL learners/Facilitators, TeamPSD Sharepoint online | TeamPSD HQ | MTL learners tailored to team access

Questions

anazariz commented 3 years ago

Question1:

Which of the following modes are we going to be using?

Power BI has two modes of working with data: Import Mode and DirecQuery Mode. The design decision between these two modes has a major impact on scalability.

1) Import Mode is similar to how Facility works. Data is retrieved (or refreshed on a regular basis) and the end user works with the cached data. BISL recommends this mode because it reduced the load on the on-site database (CDW) database. In this mode the end user connectivity to the database (CDWWork, LSV, etc.) becomes less relevant because there is no such connectivity. All security is enforce via RLS, Power BI Data Model and 365 Individual/Group roles.

2) DirecQuery Mode is similar to the Data UI. The data is fetch on-demand. In this mode the end user connectivity to the database (CDWWork, LSV, etc.) becomes relevant. This mode is useful when the back-end data is changing frequently (e.g. daily transactions). End User provisioning in this modes is controlled via the database as well as the 365 Individual/Groups roles. In this mode RLS is implemented via the LSV database, which contains views to CDW that are segmented based on user's National vs Local permission.

Question2:

Are open to a redesign of the Facility/Data UI architecture?

The Facility (Aggregate) vs. Data UI (Detail) architecture was done in that fashion for security reason and access levels. These can be re-thought in Power BI. Example: Design a base model at the lowest level and then use that model to design an aggregate model. Control who has access to each model via 365 roles.

Question3:

One giant data model segregated via sta3n RLS or multiple individual models each with a separate sta3n? Please see these Lucid charts.

one all

Please note that from a user perspective it makes no difference what method is use. The user sees his/her own data regardless of the architecture. There are pros/cons, which we can discuss.

staceypark commented 3 years ago

7/21 sim ui workgroup meeting

During code review what Should I be looking for?

Start by looking at all the code Ash added/changed from the original historical file, which Ash commented

Where can I find the facility.xlsb produced by this SQL file

TeamPSD > splash_pages > SplashMothership_online and TeamPSD > splash_pages > DataToolFRE23_nevermore_online

jamesmrollins commented 3 years ago

Sim UI Meeting 7/21

Record of assignments available in #2112. All assigned tasks below are due week 4 July.

Decided:

lzim commented 3 years ago

Cross-ref #2112 - Thanks @staceypark and @jamesmrollins for tracking the questions and decisions covered at today's meeting here on the card 🏆 (James, this trophy means "You two are champions") 🤣 (you know what this means).

lzim commented 3 years ago

@staceypark

Some of this may be helpful for @jeffhoerle getting oriented to the Master files and the discussion of Power BI in August.

But, otherwise, this may be referenced in Story 13/Power BI cards, and otherwise be closed.

staceypark commented 3 years ago

closing this one & cross-ref #2262