os-climate / os_c_data_commons

Repository for Data Commons platform architecture overview, as well as developer and user documentation
Apache License 2.0
21 stars 10 forks source link

Review and cleanup of material for new data pipeline developer onboarding #125

Open caldeirav opened 2 years ago

caldeirav commented 2 years ago

This issue will be used to create a task list of items to track against in order to ensure we have the latest documentation / code samples available to support developer onboarding for new contributors.

caldeirav commented 2 years ago

Additional items to document:

MichaelTiemannOSC commented 2 years ago

Related: https://github.com/os-climate/data-platform-demo/issues/48

@HeatherAck @cdeliaRH

HeatherAck commented 2 years ago

Need to add Open Metadata, DBT pipelines, etc back into architecture doc. Will use global power plant data as sample; updates to be completed by 24-Oct (include Real-Time ingestion capability, Kepler - incl. Data Centers - CO2 emissions of OS-Climate) for COP27 [AWS vs. Google scheduling of ML workloads]

HeatherAck commented 2 years ago

@caldeirav - can you please update https://github.com/os-climate/os_c_data_commons/blob/main/os-c-data-commons-developer-guide.md

MichaelTiemannOSC commented 7 months ago

Related: This dbt documentation (https://docs.getdbt.com/docs/build/packages) exposes a lot of complexity that the Data Mesh pattern should hide: how and where these configuration files go, how they should be populated, and how the pipeline environment just makes this as easy as pie. Right now (not least because of the version skew between the Pachyderm-friendly 1.4.9 and the latest 1.7.x developments over the past two years), there is no clarity at all.