os-climate / os_c_data_commons

Repository for Data Commons platform architecture overview, as well as developer and user documentation
Apache License 2.0
18 stars 10 forks source link

[!IMPORTANT] On June 26 2024, Linux Foundation announced the merger of its financial services umbrella, the Fintech Open Source Foundation (FINOS), with OS-Climate, an open source community dedicated to building data technologies, modeling, and analytic tools that will drive global capital flows into climate change mitigation and resilience; OS-Climate projects are in the process of transitioning to the FINOS governance framework; read more on finos.org/press/finos-join-forces-os-open-source-climate-sustainability-esg

OS-Climate Data Commons

OS-Climate Data Commons is a unified, open Multimodal Data Processing platform used by OS-Climate members to collect, normalize and integrate climate and ESG data from public and private sources, in support of:

Overview

OS-C Data Commons Platform Overview

The Data Commons platform aims at bridging climate-related data gaps across 3 dimensions:

  1. Data Availability: The platform supports data availability through data democratization via self-service data infrastructure as a platform. A self-service platform is fundamental to a successful data mesh architectural approach where existing data sources are federated and can be made discoverable and shareable easily across an organization and ecosystem through open tools and readily available infrastructure supporting data creation, storage, transformation and distribution.

  2. Data Comparability: The platforms supports data comparability through domain-oriented decentralized data ownership and architecture i.e. data is treated like a product. The goal is to stop proliferation of data puddles to “connect” the data with proper referential and relevant industry identifiers in order to have collections of data aligned with business goals.

  3. Data Reliability: The platform supports data reliability through a federated data access, data lifecycle management, security and compliance. This supports a data as code approach where the data pipeline code, the data itself and data schema are versioned so as to have transparency and reproducibility (time machine), while enforcing authentication and authorization required for data access management with consistent policies across the platform and throughout the data lineage.

For more information on this and how Data Commons fits into the picture, good introduction links include the official Data Commons page on OS-Climate website, as well as the video recording of the Data Commons Platform Overview at the COP26 in Glasgow. Detailed platform documentation maintained by our community is available in this repository and accessible through the links below.

Architecture

Data Commons Architecture Blueprint

Developer Resources

Data Commons Developer Guide