mbradds / pipeline-profiles

Data visualization platform for Canada's major pipeline systems
https://lively-desert-05b6cc51e.1.azurestaticapps.net/
MIT License
6 stars 5 forks source link
canada pipelines

Pipeline Profiles

Contributors Vulnerabilities AzureApp test-frontend test-backend test-HTML5

Azure Static Web App | Project Roadmap | CER Production Page

Designed, developed, and maintained by Grant Moss
Table of Contents

Introduction

New interactive content under development for the CER's pipeline profiles web page.

This project uses three primary technologies to create web based interactive dashboards and dynamic text specific to 25 of the largest pipelines regulated by the CER. The content is developed for both English and French. Here is a summary of the major front end frameworks used:

Sections being added:

Repository structure

pipeline_profiles
│   README.md (you are here!)
│   server.js (express js server configuration for npm start)
|   profileManager.js (controls which sections and profiles are displayed)
|   environment.yml (cross platform conda python 3 environment used in ./src/data_management)
│   webpack.common.js (functionality for creating clean ../dist folder in english and french)
|   webpack.dev.js (webpack dev server functionality)
|   webpack.prod.js (npm run build for minimized production files)
|   webpack.analyze.js (npm run analyze-build to evaluate bundle size)
|   .babelrc (babel config with corejs 3 polyfills)
|   .vscode/settings.json (please use vscode for this project!)
|   ...
|
└───test
|   |   test.js (AVA units tests for front end code, npm run test-frontend)
|   |   html5.js (runs html-validate on all .html files in /dist)
|
└───src
│   │
│   └───data_management
│   |   │   conditions.py (creates conditions data for front end)
│   |   │   incidents.py (creates incidents data for front end)
|   |   |   traffic.py (created throughput & capacity for front end)
|   |   |   tests.py (python unit tests npm run test-backend)
|   |   |   util.py (shared python code module)
|   |   |   updateAll.py (npm run update-all-data pull live data for all datasets)
|   |   |   queries/ (contains queries used to get data from CER sql servers)
|   |   |   raw_data/ (pre-prepared data used by python when not pulling from remote locations)
│   |   │   ... other python files for pipeline datasets
|   |
|   └───components (handlebars partials)
|   |
|   └───css (main.css, transferred over to dist/css/main[contenthash].css via MiniCssExtract)
|   |
|   └───entry (entry points for all profile webpages)
|   |   |   webpackEntry.js (specifies all the js and html entry points for /dist)
|   |
|   └───data_output (output data folders for each section. Contains prepared data ready for charts)
|   |
|   └───dashboards (Higher level files/functions for creating each dashboard)
|   |
|   └───modules (shared dashboard code & utility functions)
|
└───deploy (Prepares CER production files with new HTML sections in /dist)
│
└───dist (tries to match dweb7 folder structure)
    │   en/ english js bundles & html for each profile (to be placed on web server)
    │   fr/ french js bundles & html for each profile (to be placed on web server)

Software prerequisites

  1. npm (check package.json for version)
  2. node (check package.json for version)
  3. Anaconda (for contributing and running the "back end" code in src/data_management)
  4. Git (for contributing)
  5. Git windows client (for contributors using windows. The git client terminal can be used to run (optional) unix shell scripts)

Quick start for contributing

  1. clone repo
cd Documents
git clone https://github.com/mbradds/pipeline-profiles.git
  1. install dependencies

First time install:

npm ci

or

npm install
  1. create branch
git checkout -b profile_improvement
  1. start webpack dev server and make changes to the source code
npm run dev

This runs webpack.dev.js

  1. Deploy to the CER

Comment out all styles in src/css/cer.css. These styles emulate some of the extra styles on CER pages, and dont need to be added.

npm run build

This runs webpack.prod.js and emits minified bundles in /dist

Note: npm run build && npm start runs the express server using the production files. Test this on all major browsers prior to new releases.

Create a new release on GitHub and add the compressed dist folder. Ask the web team to dump the latest production files onto dweb7 and add the new dist files/changes before sending in a production web request.

Remotes

There are two remote repositories.

  1. GitHub This should continue to be the main repo for my development + managing other contributors pull requests.
  2. Azure Dev Ops This is the main repo for "work" and will eventually serve as the main ci/cd pipeline for deployment once the CER can handle such things.

I've added some convenient npm scripts for switching remotes:

npm run switch-remote-personal
npm run switch-remote-work

Quick start for updating data

Are you using windows?

Unless you are running the code through an IDE, you will need to use the Anaconda Prompt to run the scripts, otherwise you will get the following error: 'conda' is not recognized as an internal or external command, operable program or batch file.

  1. clone repo into your documents and set up repo.
cd Documents
git clone https://github.com/mbradds/pipeline-profiles.git
npm install
  1. Set up CER database connection strings.

Several datasets are pulled directly from CER internal databases. A single python file src/data_management/connection.py handles the sqlalchemy connection parameters and strings. An untracked json file src/data_management/connection_strings.json contains the hard coded database connection strings. A template file src/data_management/connection_strings.example.py file is included with the connection strings left blank. Before running or contributing to the python code, you will need to open this file, add the connection strings, and save the file here: src/data_management/connection_strings.json to ensure that connection info remains untracked.

  1. Set up the pipeline-profiles conda environment and run the update-all-data command.

It is highly recommended that you first create the conda python environment described in environment.yml. The npm scripts for data updates expect a conda python environment called pipeline-profiles. The easiest way to run the update data operation is through the anaconda prompt comand line. Using the Anaconda Prompt, run the following operations.

cd Documents
cd pipeline-profiles
conda env create --file=environment.yml
npm run update-all-data

The last operation npm run update-all-data may take a few minutes to run. Once its completed, you will see all the output at once. If an error is encountered, the program will stop and display the error message. Feel free to try and fix the error or ask me.

  1. Make sure the code can compile, and then push new data to main.

You can re-use the same Anaconda Prompt shell from the last step. Run the following command to compile the front end code + the new json data. The code should complile, otherwise there is some kind of compatibility error between the data and the JavaScript code. This is usually the result of null values not being encoded properly. Feel free to fix the error in the python code and re-run step 3, or ask me to fix it.

npm run build

If the code compiles, then push the changes to main, and the test website will update after a few minutes.

git add .
git commit -m 'updated data'
git push

Deploying to CER production server

This continues to be a challenge because I control/update only a portion of the pipeline profiles, and there is no way for me to access the main production files or keep up with other changes through a version control system. Therefore my content and code needs to be merged with CER files and updated on the website very quickly to avoid a situation where others are working on the files. Also, there is also no way I can easily mimic the CER server environment for local development. In the abscence of an organizational version control system, its not realistic to use or mimic much, if any, CER infrastructure/files during the development process.

Up until recently (summer 2021) my approach to these constraints and problems was:

  1. Request that the web team dump the latest production files into dewb7/data-analyis-dev.
  2. Delete the old js and css bundles and replace them with new ones.
  3. Delete the old html sections/script tags from the lastest CER production files and copy and paste the new html sections from my dist/ folder into the correct location in the CER files.
  4. Repeat this process for all profiles in english and french (50 total).
  5. Send in a final web request to publish.
  6. Review links in tweb (I have no access to tweb and cant do this step).
  7. Tell the web team to publish.

As of September 2021, I've added some automation in deploy/make_production_files.py that largely cuts out the need to delete & copy/paste html sections. Here are the new steps:

  1. Request that the web team dump the latest production files into dewb7/data-analyis-dev.
  2. Delete the old js and css bundles and replace them with new ones.
  3. npm run build
  4. npm run deploy
  5. Copy and paste full html files from deploy/web-ready into dweb7/data-analysis-dev (50 html files replaced).
  6. Send in a final web request to publish.
  7. Review links in tweb (I have no access to tweb and cant do this step).
  8. Tell the web team to publish.

Adding a new profile section

Adding a new section typically involves two major parts: The back end data (python), and the front end (JavaScript). Starting with the raw data, here is the typical pattern:

raw data (sql or web) -> python -> json -> es6 import -> JavaScript/css -> handlebars template -> translation -> release

Python data prep

  1. Create a new python file in src/data_management. Prepare a reliable connection to the dataset, either a remote datafile or internal sql. The profiles are segmented by pipeline, so the data prep will involve splitting the dataset by the pipeline/company column, and creating one dataset for each company. Output files in json format to ../data_output/new_section/company_name.json.

  2. Start to pay attention to file size of the outputs. Try to keep the average dataset around 15-20kb.

Front end data viz/section

Start with just one profile (ngtl)

  1. Create a new folder/file: src/dashboards/newDashboard.js.
  2. In this file, create a really simple "hello world" kind of function to accept the data:
export function mainNewSection(data) {
  console.log(data);
}
  1. Add the new data to the data entry point in src/entry/data/ngtl.jsThe data should (eventually) be made language agnostic.
import canadaMap from "../../data_output/conditions/base_maps/base_map.json";
import conditionsData from "../../data_output/conditions/NOVAGasTransmissionLtd.json";
import incidentData from "../../data_output/incidents/NOVAGasTransmissionLtd.json";
import trafficData from "../../data_output/traffic/NOVAGasTransmissionLtd.json";
import apportionData from "../../data_output/apportionment/NOVAGasTransmissionLtd.json";
import oandmData from "../../data_output/oandm/NOVAGasTransmissionLtd.json";
import remediationData from "../../data_output/remediation/NOVAGasTransmissionLtd.json";
+import newData from "../../data_output/newSection/NOVAGasTransmissionLtd.json";

export const data = {
  canadaMap,
  conditionsData,
  incidentData,
  trafficData,
  apportionData,
  oandmData,
  remediationData,
+ newData
};
  1. Add the es6 export from step 2 to the code entry point in src/entry/loadDashboards_en.js:
import { mainNewSection } from "../new_section/newSectionDashboard";

export async function loadAllCharts(data, plains = false) {
  const arrayOfCharts = [
    mainNewSection(data.newSectionData),
    otherCharts(data.other),
  ];
}
  1. Start the project with npm run dev to open the webpack dev server on port 8000. Make sure that the data appears in the console, and you will be good to start developing the JavaScript.

  2. Pretty soon after step 5 you will need to set up the html and css infrastructure. CSS can be added to src/css/main.css. There is only one css file for the entire project. I might split this css file soon, but for now just keep all the css for each section roughly together.

Aside: Why handlebars

Conditional handlebars templates are used to control which sections get loaded for each profile. This is one of the most complicated parts of the repo, but its powerful for a project like this. The logic in the remaining steps acts very similiar to a content management system. Here is why we are doing it this way:

  1. Create a new handlebars template here: src/components/new_section.hbs. For now, ignore the templates, and just write html with english text/paragraphs.

  2. Add this new template to the profile manager here: profileManager.js. It doesnt matter what you call the section, but remember it for the handlebars conditional later. It seems obvious that this file should be automatically generated based on which profiles have data for a given section, but i would prefer to leave this step manual. It adds an extra layer of protection agains sections getting rendered by mistake, and its easy to updata/maintain.

const profileSections = {
  ngtl: {
    sections: {
      traffic: { map: true, noMap: false },
      apportion: false,
      safety: true,
      new_section: true, // when set to true, handlebars will inject the section html
    },
  },
};
  1. Add the new handlbars template to the main handlebars file here: src/components/profile.hbs. Once this is done, then npm run build and npm run dev should load your html.
{{#if htmlWebpackPlugin.options.page.sections.new_section}}
  <!-- Start New Section -->
    {{> new_section text=htmlWebpackPlugin.options.page.text}}
  <!-- End New Section -->
{{/if}}
  1. Before running npm run build and npm run dev, you should remove "fr" from the webpack.common.js to avoid errors. Once you have added french to the data and code entrypoints (step 3 and step 4) then the js+html can compile in both dist/en and dist/fr
const profileWebpackConfig = (function () {
-  const language = ["en", "fr"];
+  const language = ["en"];
})();
  1. Once you are done the new section, add all the JavaScript string to src/modules/langEnglish.js and src/modules/langFrench.js and all the html text/paragraphs to src/components/htmlText.js. Follow the same logic for importing/templating found in other completed sections.

  2. Write python unit tests: src/data_management/tests.py and JavaScript unit tests: test/test.js

  3. Create PR. I'll review all the code.

Tests

Python unit tests (back end)

The greatest risk for errors, such as incorrect values appearing in the front end, are likely to happen as a result of errors in the "back end" python code. These python scripts compute large amounts of summary statistics, totals, metadata (number of incidents, most common, types, etc) from datasets that have inherent errors. This is made more risky by the fact that there are english and french datasets (only for conditions), and these datasets may have unrelated problems. Here is a list of embedded data errors I have noticed so far:

  1. Trailing whitespace in text columns. This is a problem when filtering/grouping, because "NGTL" will be seperated from "NGTL ". This error is mainly mitigated by running .strip() on important text based columns.
  2. Duplicate or incorrect company names. For example "Enbridge Pipelines Ltd." and "Enbridge Pipelines Inc". Notice the difference? This is mainly corrected by exploring all company names at the beginning of development and running something like this:
df['Company'] = df['Company'].replace({"Enbridge Pipelines Inc": "Enbridge Pipelines Inc."})

There are several python unit tests written for the various python data outputs and utility functions. These are found here src/data_management/tests.py

The python unit tests can be run through an npm script:

npm run test-backend

This code is difficult to test, because the code is run on data that updates every day, or every quarter. To simplify this, i have added static test data seperate from "production" data. The test data is located here: src/data_management/raw_data/test_data. npm run test will test the python code on static data, where things like the correct totals, counts and other numbers that appear later on the front end are known.

The unit tests check a bunch of summary statistics and data validation metrics specific the the ngtl profile. It will also test to see if the english numbers/data are the same in french.

AVA unit tests (front end)

Test coverage is pretty low right now. Mainly focussing on major re-usable functionality in src/modules/util.js and major calcualtions done on the front end like the five year average. I would like to move more general/pure functions to src/modules/util.js so that they can be tested easier.

npm run test-frontend

Dependencies

Dev Dependencies

Note: the html-webpack-plugin and handlebars-loader is instrumental for this project. Older versions of this repo only had two templates, one for english and one for french. As the project grew, I needed a tempalte engine. A good example of this need is the apportionment section. There are only around 5 oil pipeline profiles with apportionment data (there could be more in the future though!) so i dont want to include the apportionment html in 20 profiles that dont need it, and then hide/show divs conditionally after the dom is ready. This probably causes layout thrashing. With handlebars, i can conditionally render components/sections based on the logic in profileManager.js. Even better, with handlebars-loader, one html is compiled for each profile (web team can only handle html) and html-webpack-plugin still injects all the scripts.

This was the old way before handlebars: Each pipeline profile webpage is essentially the same, but with different data. The two templates src/profile_en.html and src/profile_fr.html contain all the text and web resources (css, scripts tags) and the plugin injects the appropriate script tags for the profile. Changes made to these templates will appear on all 25 profile pages in english and french.

Updating dependencies

This is a long term project, and dependencies should be updated every so often. Run npm outdated every so often. Regular updates to important dev dependencies like webpack and babel will likely improve compile time and code size. Updates to production dependencies like highcharts and leaflet will improve security and allow for the latest features to show up for users.

Making sure that all dependencies are updated and both package.json and package-lock.json are updated is kind of weird. Here are the steps to make it happen:

  1. npm install -g npm-check-updates
  2. ncu -u
  3. npm install

I need help list

Here is a list of things I'm stuck on and potentially need help with!

  1. Webpack runtime chunk

It looks like a runtime chunk is required based on the webpack pattern I've set up. Each profile has a runtime chunk that serves as the main entrypoint for the other chunks. I would like to avoid this if possible!

  1. CER databases or Open Gov?

The core datasets are all pulled directly from Open Gov. I need to do this to maintain consistency with Open Gov, but connecting to CER databases would allow for really cool daily updates once the ci/cd pipeline is ready. This is going to take some time to migrate!

TODO list

Take a look at the issues tab for a more up to date list. I dont update this section of the readme anymore.

Completed TODO's