Open anvabr opened 10 months ago
The traditional way to address step 5 (combine with previous and do some more transformations) is a relational database because it efficiently stores data and it has sophisticated means of combining data (queries and joins etc). The ability to store data that changes at a high cadence is one part of the requirement, the other is the ability to query such data. Example: Every instrument that is deployed has a set of requirements determining the validity of the data from that instrument that is not a function of the data itself. Typical requirements are things like inspection and callibration frequency. The protocol may require that an instrument is inspected once in six months and calibrated once a year. A read from an instrument is then actually a query of the conditions of validity for the data (e.g. the calibration and inspection logs) as well as the data itself: "Get all data from instruments that still have valid calibrations " . If the query is deterministic, perhaps the query itself and a timestamp is enough to deliver on the promise of transparency and immutablity (" I did this query at this time and got this result")
Problem description
At the very high level Guardian policy execution boils down to the following workflow:
The underlying technologies that Guardian uses for storage are IPFS and Hedera Topics.
IPFS works very well for documents but is not very efficient for data, in particular data which undergoes many transformations, each of which needs to be verifiably performed and recorded.
Hedera Topics have content size limitations and is not do not have efficient addressing system.
For many real-world use-cases the required volume and complexity of calculations (and thus transformations) on the original MRV data is such that full automation of such workflows using existing Guardian technology will likely be very challenging if not impossible.
Requirements
Identify and integrate with a distributed storage technology to allow Guardian to scalably work directly with data (similarly how it would have worked with a relational database) while maintaining a full record of data provenance and guaranteed policy adherence verifiability for the data processing and transformations.
Some relevant links:
Definition of done
Acceptance criteria