Closed aaronc closed 3 years ago
Title | Milestone | Assignees | Stage | State |
---|---|---|---|---|
Create merkle-tree based RDF canonicalization spec #64 | N/A | N/A | Open |
As far as problem definition- i'd like to flesh out 2 interrelated use cases that will be good to orient this work around:
As part of our issuing of CarbonPlus Grasslands credits, the credit module keeps it's use of metadata quite minimal (see credit rfc). This metadata is meant to link to claims about ecological data. While the credit RFC in its current form only requires arbitrary metadata blobs that may link to offchain data, we should also consider the use case of metadata from a credit linking to a previous claim (or baseline monitoring assesment) that is also represented as an on-chain asset. In this case, it would be good to have an easy way for the credit module to point to a dataset that was stored or signed using the operations described here.
Some of the first technical partners that we will be wanting to make use of data anchoring, storage, and signing capabilities are ones from the OpenTEAM.
This year we had an approved paired work session to work with SurveyStack (a generic survey form application) as an initial partner for adding digital signature capabilities to tools in the openteam ecosystem. This is currently a proposed item for SurveyStack's "Year 2 Research Farm Roadmap" proposal (see Blockchain / Ledger integration section).
This work package actually connects quite closely with the previous Regen Registry use case, as the end-use case for this SurveyStack integration is the monitoring surveys that are being completed by Regen's science team as part of the monitoring tasks for issuing a CarbonPlus Grasslands credit.
In response to the actual proposal, I think this is a great starting place. Anchoring data, Signing data, and Storing data all being treated as separate operations makes sense to me - and by use of the same CID standard, they can actually connect to each other.
2 questions I have though:
In response to the actual proposal, I think this is a great starting place. Anchoring data, Signing data, and Storing data all being treated as separate operations makes sense to me - and by use of the same CID standard, they can actually connect to each other.
2 questions I have though:
what kind of querying functionality would we like to support?
- the ability to index by CID ? (give me all signatures, and/or content corresponding to some CID) ?
Yes
- will there be any deduplication efforts to prevent storing multiple anchorings of the same dataset, what about multiple sign records and/or overlapping groups of signers? How will these be resolved when querying?
Yes deduplication of data and yes multiple signers.
- Even at this first initial step, it would be really great to have some optional parameters that provide geoJSON / polygon support. Is that something that can be easily added to this scope? I think that even the most basic structured polygon data alongside CID's would be a huge-win and make early adopters a lot more excited to experiment with the module.
Content can include polygon data if it chooses. I'm resistant to adding some geo capability on top of this just because. I believe we will eventually want to specify more structured data, ideally with a Merkle tree based CID hash (#64) that includes some geo polygon spec. It's still only really relevant on chain if it's indexed IMHO.
The idea to use blockchain for timestamping data one of canonical use-cases. It was used on Bitcoin, Ethereum and even Ripple. There is also lot of academic research (eg: Stampery Blockchain Timestamping Architecture )
x/data
is duplicating a blockchain (blocks data) - we copy the data from transaction and wrap them in more meaningful structures without doing additional operations.If we don't do any operation then more natural way would be to wrap the transaction data in Event
s and use off chain DB for indexing. The verification process would be following: query the index for a specific data. This will result with a transaction_id or an event. Then you can use it to query tendermint to validate that the event / transaction indeed happened at a particular time. This would solve the issues above and keep the blockchain slim.
That's true @robert-zaremba. I guess one complication is we don't have access to the tendermint block history in the SDK state. So if we wanted to use the data or signers in a smart contract we couldn't... But yes it is duplication
Wouldn't be better to allow smart-contracts to select Events
? A smart-contract method argument would require an event id.
Closing this as the basic implementation was completed in #118 and #124
Summary
This proposes a module for tracking data related to ecological claims that lives both on and off-chain.
Problem Definition
The data module aims to satisfy the following use cases:
Can you say any more about this @clevinson ?
Proposal
Anchoring Data
Anchoring data refers to storing a hash of a piece of data on-chain that allows for proof of existence of that data at some block height. This effectively creates a “secure timestamp” for the data which proves that the data was created no later than the block height where it was included.
We propose a simple message
MsgAnchorData
for anchoring data on chain which uses the IPFS CID Specification as the expected format for data hashes:Signing Data
Signing data refers to creating a claim that the signer of the piece of data attests to its veracity. What veracity means may be somewhat dependent on the context of the document being signed, but for simplistic purposes we draw an analogy to a legal document. If a party puts their signature on a legal document, its pretty clear from the contents of the legal document what the signature implies.
We propose a simple message
MsgSignData
for making on-chain signatures to data:A few notes:
signers
is distinct fromsender
inMsgAnchorData
. Thesender
inMsgAnchorData
could be a third-party relayer like a “postman”. Just delivering a document does not mean that one has signed the document in the legal sense. SoMsgAnchorData
implies that thesender
simply delivered the document to the registry for timestamping.MsgSignData
implies that thesigners
signed the document.MsgSignData
automatically “anchors” the document on-chain. A call toMsgAnchorData
is not needed if this is the first time the data has appeared on-chain. Two different messages are proposed because the data may be anchored on-chain before it is signed and different signers may sign the same document at different points in time.group
module allow public keys to be securely associated with an identity at one point in time and not another. The on-chain signature captures when the signature was created to verify that it is valid at that point in time. Off-chain mechanisms can include the concept of key revocation but there is no way of enforcing this without the type of secure timestamps and account sequences that a blockchain provides.Storing Data
If desired, we can also store data directly on the blockchain. Off-chain storage will generally be cheaper, but on-chain data storage provides the following benefits:
We propose a simple message
MsgStoreData
for storing data on-chain:The provided
content
bytes should be verified against the provided CID. WhileMsgAnchorData
andMsgSignData
should support any valid CID,MsgStoreData
should support only an approved list of formats and hashes with gas priced appropriately for each supported hash.Future Improvements
Off-chain Data URL Index
One simple improvement to the above design is to allow for URLs to be stored for any CID. This effectively creates an index of
Partial Data Storage
Using a merkle-tree based canonicalization/hashing algorithm as described in #64, we could allow for part of a piece of data to be stored on-chain using merkle proofs. This would allow for:
Secondary indexes
We could allow for certain data properties within well defined document structures to be automatically indexed on chain so that they are searchable in smart contracts. For instance, we could define a property that defines a geographic polygon and whenever a piece of data is stored that contains that property, it is indexed in an on-chain geospatial data store. Requires more in-depth research. See regen-network/regen-ledger#87 .
Schema Validation
On-chain data could be validated against some schemas for conformity. This may or not be the responsibility of on-chain consensus and would depend heavily on the use case. Schema validation if it were to be incorporated would likely relate to some on-chain schema registry which is out of the scope of this proposal.