hashgraph / guardian

The Guardian is an innovative open-source platform that streamlines the creation, management, and verification of digital environmental assets. It leverages a customizable Policy Workflow Engine and Web3 technology to ensure transparent and fraud-proof operations, making it a key tool for transforming sustainability practices and carbon markets.
Apache License 2.0
97 stars 129 forks source link

Scalable MRV data storage and transformation provenance capabilities #2907

Open anvabr opened 10 months ago

anvabr commented 10 months ago

Problem description

At the very high level Guardian policy execution boils down to the following workflow:

  1. get some data (from sensors, or humans) publish as a VC (in IPFS)
  2. do some transformations
  3. record the result in a VC doc, publish (in IPFS)
  4. get some more data
  5. combine with previous and do some more transformations
  6. record the result in a VC doc, publish (in IPFS)
  7. repeat the cycle 1-6 numerous times
  8. create a token (in Hedera)
  9. repeat the entire cycle 1-8 until END

The underlying technologies that Guardian uses for storage are IPFS and Hedera Topics.

IPFS works very well for documents but is not very efficient for data, in particular data which undergoes many transformations, each of which needs to be verifiably performed and recorded.

Hedera Topics have content size limitations and is not do not have efficient addressing system.

For many real-world use-cases the required volume and complexity of calculations (and thus transformations) on the original MRV data is such that full automation of such workflows using existing Guardian technology will likely be very challenging if not impossible.

Requirements

Identify and integrate with a distributed storage technology to allow Guardian to scalably work directly with data (similarly how it would have worked with a relational database) while maintaining a full record of data provenance and guaranteed policy adherence verifiability for the data processing and transformations.

Some relevant links:

Definition of done

Acceptance criteria

christiaanpauw commented 10 months ago

The traditional way to address step 5 (combine with previous and do some more transformations) is a relational database because it efficiently stores data and it has sophisticated means of combining data (queries and joins etc). The ability to store data that changes at a high cadence is one part of the requirement, the other is the ability to query such data. Example: Every instrument that is deployed has a set of requirements determining the validity of the data from that instrument that is not a function of the data itself. Typical requirements are things like inspection and callibration frequency. The protocol may require that an instrument is inspected once in six months and calibrated once a year. A read from an instrument is then actually a query of the conditions of validity for the data (e.g. the calibration and inspection logs) as well as the data itself: "Get all data from instruments that still have valid calibrations " . If the query is deterministic, perhaps the query itself and a timestamp is enough to deliver on the promise of transparency and immutablity (" I did this query at this time and got this result")