os-climate / os_c_data_commons

Repository for Data Commons platform architecture overview, as well as developer and user documentation
Apache License 2.0
19 stars 10 forks source link

EPIC: demonstrate multiple dataset federation to support PCAF and SAMEPATH #320

Open MichaelTiemannOSC opened 1 year ago

MichaelTiemannOSC commented 1 year ago

The PCAF calculations uses publicly accessible datasets that can be downloaded via RESTful APIs. Those datasets are also used by SAMEPATH. @ericbroda has confirmed that he can access those datasets in ways friendly to the Data Exchange, but he has not yet been able to dedicate the time to actually making them items that can be ordered in the Data Exchange.

In discussions with @toki8 we agree that showing a Data Exchange that can federate multiple data sources and serve multiple use cases (such as PCAF and SAMEPATH) would go a long way toward motivating members to participate in Data Exchange product definition and potentially onboarding. Given that we have both data and use cases in hand, as well as general technical ideas about how both the Data Exchange and Data Mesh could support this end-to-end proof of concept, we'd like to schedule a substantial work session that would kick off this EPIC and allow us to complete work in a short series of code sprints.

The most important part of this EPIC is to realize how we can translate a dataset endpoint into useable data in the data mesh, and how we need to re-code applications to use the data mesh instead of using manually copied data. The Data Exchange UI/UX is an opportunity but not a blocker for this EPIC. I'm sending scheduling emails separately, but feel free to add technical and technical resource details as you see fit.

ericbroda commented 1 year ago

I have a POC that lets you find and download a very small subset of data from the following data sources (it is only a subset as it takes some effort to acquire and ingest the data which is the limiting factor as Michael states.

Selected subset of data available in POC from sources: OECD, WDI, UNFCCC, EDGAR, PRIMAP

Others that I know I can access via Apis (but not in the demo) include NASA, NOAA, US Census, and many others.

HeatherAck commented 1 year ago

Marius/Red Hat will commit resources; meeting on 14-Aug to kick-off (5pm PT - Data Mesh standing mtg). DBT 4.6 release available (support for materialized views in Iceberg). Goal to unburden tech teams from manual imports of data sources by 14-Sep.