lzim / teampsd

Team PSD is using GitHub, R and RMarkdown as part of our free and open science workflow.
GNU General Public License v3.0
9 stars 23 forks source link

wk? backlog_epic Story_2 & Story_5: SQL Improvements #1633

Closed anazariz closed 3 years ago

anazariz commented 3 years ago

Improving the way we implement and run our SQL scripts

Implementation

SQL Server Temp Tables

Currently our SQL scripts make heavy use of SQL Server's Temp tables. Temp tables are database tables that reside in the tempdb system database. Temp tables are named with a preceding hash symbol ("#").

Temp tables exist on the database server temporarily. They have a session lifetime, which means when the script is closed or the session interrupted, all Temp tables and their contents are lost. With large datasets (many tables with 10s of millions of rows each) this behavior results in needless recalculation time and that is not scalable especially with "What-If" scenarios where tweaking of the data starts in a certain point of the data stream and no new changes are introduced to any of the tables prior to that point. The situation is depicted below.

Table Dependant On Processing Time
#A 1 Hour
#B #A 2 Hour
#C #B 3 Hour
#D #C 1 Hour
#E #D 2 Hour
#F #E 1 Hour
#G #E,F 2 Hour

... Total processing time for #G is 12 Hours.

Table Dependant On Processing Time Keep?
#A 1 Hour
#B #A 2 Hour
#C #B 3 Hour
#D #C 1 Hour
E #D 2 Hour Yes
F #E 1 Hour Yes
#G #E,F 2 Hour

Changing tables #E and #F into Perm tables, reduces the processing time by 7 hours each time the session is restarted.

Use of Functions and Stored Procedures Stored Procedures and Functions are compiled SQL code that can be reused over and over again. These objects make the code easier to read and test. Currently our scripts do not take advantage of such objects.

Back-End VS. Front-End code Back-End refers to SQL or Transact SQL (procedural SQL) code that lives exclusively on the SQL Server Database Server. Front-End refers to SQL or Transact SQL code that resides on the client side (e.g. Excel). In most cases the Front-End code comprises a mixture of SQL and native code (e.g. VBA, R, Python, etc.). Front-End SQL is not pre-compiled and is only validated when submitted by the front-end application. Front and Back-End code may be mixed to produce optimal results.

Example:

Front-End-only style:

  1. Select a Facility from a dropdown list

  2. Get a list of patients and their visits for the Facility in a given period (query built on the Front-End)

  3. Put the resultant list into a cursor and traverse the list (do something with it)

Mixed style:

  1. Select a Facility from a dropdown list

  2. Put that Facility into a SQL Server table

  3. Run a Stored Procedure (Back-End) to produce a list (into a table or return a result set) of patients and their visits for the Facilities in the Facilities table for a given period

  4. Depending on how the result is returned, either do a SELECT from the table or read the resultset into a cursor and traverse the list (do something with it)

The benefits:

  1. It will be easier to read and debug the less complex Front-End code.
  2. The Back-End Stored Procedures can be invoke from any Front-End client so migration to other platforms such as Power BI will be easier.

SQL Views

Views are pre-complied scripts. They do not hold data as tables do (unless they are materialized views) but they can be used to customize what the end user sees. They also help reduce code complexity. Views are good when access to the source code is needed (how the script was put together). However, they do not offer performance benefits over the Temp tables because they assemble their data when the query is run.

Automation - Execution of SQL Scripts

SQL automation in a nutshell consists of:

  1. Running SQL scripts off of a calendar schedule
  2. Performing CRUD operations (Create, Read, Update and Delete)
  3. Providing progress messages during execution
  4. Taking preliminary corrective action such as retriggering failed scripts
  5. Creating, deleting, sensing the presence of and moving files

Microsoft provides automation capabilities via its Low-Code Cloud-based Power Automate product, which is a component of the Microsoft Power Platform. The platform offers a Proxy mechanism that allows Power Platform components to reach on-premise SQL Server database instances such as CDW.

Power Automate integrates numerous Microsoft and non-Microsoft platforms including SQL Server, Dynamics, Teams, SharePoint, GitHub, Office 365 and Outlook.

Current Operations:

Our current monthly processes comprise:

  1. SQL Server
  2. Excel 365
  3. SharePoint
  4. Excel VBA for Automation
  5. Forio

Proposed Operations:

  1. SQL Server
  2. Power BI (as an Excel replacement)
  3. SharePoint 365 (or Power Apps Portal)
  4. Power Automate (as a VBA replacement)
  5. Forio

Why Power BI?

Power BI is Microsoft's (and VA's) long-term Analytics roadmap. While Excel is excellent for handling transnational level reporting, Power BI has been designed to handle larger datasets. Power BI and Excel share several data-related architectures. However, Power BI offers more robust analytics capabilities. Also, in most cases Power BI eliminates the need for rendering data via SharePoint because of its robust and user friendly UI.

image

What do we need to get started?

  1. We need to figure out the GA cost for using the Power Platform.

    • Phase I - Power Automate (Model-Based or not?)
    • Phase 2 - Power BI (including its AI capabilites and R integration)
    • Phase 3 - Power Apps (including the Portal)
  2. We need to make sure that Proxy is available to us

  3. We need to understand the security and other constraints (can we use the Power Platform to automate?)

Proposed Future Roadmap:

Phase 1-

Phase 2-

Phase 3-

lzim commented 3 years ago

@anazariz @anthonycpichardo

Please review this issue and tell me where it fits in Story 2 by updating the title card to reflect the details by wk epic please also edit using 9 clicks. Thanks!

lzim commented 3 years ago

Added Story 5 as the thread seems to cover ground on both stories. I see a number of the items above have been completed for Power BI.