Publish a PROOF OF CONCEPT dashboard/report of common queries made to the Fishery Observing System using Fabric and Power BI

Br-Johnson commented 1 month ago

[x] #40
[x] #29
[x] Provide more background context for Anna from meeting with Jason and Joey originally. (see below)
[x] #41

Background Info (Notes from initial meeting with Joey, Jason, and Brett):

Jason: Most common data request is an in season commercial in season catch estimates. Data are published to open data. Broken up by gear type. Internal data. If less than 3 vessels are out then the data are restricted because people could ID fisherman if there's only 3 vessels out. Data are provided by calendar year. 2005-2023 is available.

In season estimates are produced by fishery managers, who produce these estimates based on a number of different data sources (including mainly Fishery reported data, but also fly overs etc.) Based on in season estimates, not log book reconciled after season. End of season report data available on Open Government Portal from 1994-2004.

Active Fishing Vessels by Calendar Year Query

RDG requested this recently
These data are internal only. List of vessels can't be public but rolled up charts could be.
Query in Oracle crashes if more than one or two seasons queried at once
This data is fisher reported. Not fishery manager estimated.
Troll season is the longest. Lots of data.
Jason gets result of SQL query and then puts it into a pivot table in excel to share
Interesting to note the delcine of active fishing vessels starting in 2018. 2018 was when the southern resident killer whale fishery restrictions came into practice. Also, license buy back...

Commercial Openings Query (Constitute 25% of his total data requests, takes half an hour each time)

Summary of all the openings by year, target species, PFMA
START_TIME, END_TIME and HOURS_OPEN are what end users are most interested in

Commercial Catch Query

Include ability to select in-season estimates or fisher reported logbook data
Filter by fishery, species, year for commercial salmon Jason also getting asked for Bycatch...

In summary

Plan is to take all raw FOS data imported into Pacific Science Sandbox in Fabric and recreate Jason's SQL queries to get a dataset(s) that could support the three queries to build a Power BI Dashboard with that Jason could use to make data requests, and to send to internal stakeholders for self-serve data access.

Need to be very mindful of privacy concerns as the dataset contains Protected B data. Must not be shared externally unless data are rolled up to aggregate vessels and there are more than three vessels.

Data need to be promoted to Silver layer in Fabric, and eventually Gold layer once all data quality checks completed.

Might be nice to have three different views (tabs) of the data in the dashboard that represent Commercial Catch, Commercial Openings, Number of Active Fishing Vessels.

Br-Johnson commented 3 weeks ago

Hey @AnnaD-M Any thoughts on next steps to try and get an MVP out as a demo? A nice endpoint for this project would be to have a few slides describing the steps involved: 1) cloning oracle db into Fabric Lakehouse (Bronze), 2) transforming the data via spark notebook (silver); 3) orchestrating workflow updates; 4) proof of concept FOS Dashboard App for common queries supporting high level decisions, with user-defined grouping and filtering.

I think we should get an MVP out before testing and validating the query and promoting it to a gold level. For that step we will likely need to re-write the transformation notebook (possibly in SQL for easier comparison with Jasons queries and easier maintainability into the future, also could consider using R) and build a star schema as a demo of best practices.

AnnaD-M commented 2 weeks ago

Re: Commercial Catch Query: 'Include ability to select in-season estimates or fisher reported logbook data'

Quick explanation from Jason: Fisher reported catch is data called into a call centre and reported by a fisherman or data entered into a logbook. The in-season estimates are produced by fishery managers. They are based on the fisher reported catch but take other data sources into account. Like overflight data. Validations. Interviews etc. A phone-in report can either be an ELog or a report verbally given to a call centre agent. A logbook report is data transcribed from the hard copy logbook by the service provider post season. The reports should... SHOULD.... be duplicates. BUT. In the event there is a discrepancy the logbook report is considered the true report.

Need to figure out how to resolve this. Currently being double counted in summary table. Need an IF/ELSE statement to retain phone-in if no logbook entry exists? Currently more phone-in records than logbook records...

pbs-assess / data-stewardship-unit