magda-io / magda

A federated, open-source data catalog for all your big data and small data
https://magda.io
Apache License 2.0
501 stars 92 forks source link

Possible Opportunity for Magda to Solve Boarder Sense `Open Data` Issue With Merkle Tree Related P2P Technologies #2232

Open t83714 opened 5 years ago

t83714 commented 5 years ago

Further our topic on re-position Magda from the technology innovation perspective, I think we might want to have a look at the possible opportunity for Magda to solve the in a broad sense Open Data issue (i.e. not just the public sector or government-related ) with Merkle Tree Related P2P Technologies.

The main idea is to help open/public dataset to be published, distributed, utilized, recreated & authorized over a distributed network and managed & governed by the network itself. Therefore, the data publisher doesn't have to take the data custodian role or worry about the dataset distribution infrastructure.

The benefits will be:

Here are two closely related ideas:

  1. Leverage blockchain technology (probably a private version in order to be free) to:
    • allow a dataset to be created with policy & contract of how the data can be improved and what's the (quality) standard of the dataset.
    • let network consensus to execute & manage the future collaboration of the dataset 2 Leverage P2P content sharing technology e.g. IPFS to solve the problem of the dataset distribution problem. Therefore, the data publisher doesn't have to serve the whole world's traffic of the dataset. And the content will be hosted by everyone who gets involved.
maxious commented 5 years ago

"Dat is a protocol for sharing data between computers. Dat’s strengths are that data is hosted and distributed by many computers on the network, that it can work offline or with poor connectivity, that the original uploader can add or modify data while keeping a full history and that it can handle large amounts of data." https://datprotocol.github.io/how-dat-works/ https://dat.foundation/

There is an opportunity to have a data catalog - just like GitHub, where is DatHub? Even if you have the network, you still need to discover data and additionally know about forks of the data so you can use them or merge them back in to the original dataset - Pull Requests?

t83714 commented 5 years ago

Thanks, @maxious Dat is a very good one~ Yeah~ As you said, I am thinking about a Dathub like Github but even further the management/governance of a particular dataset should be decentralized & not rely on the data publisher as well (have been a data consumer previously, it seems not every publisher is interested in managing the dataset ).

Currently, the open data publishing process for a publisher is more like a make data available process. After a dataset is published, the data publisher is still reliable for:

A P2P data sharing protocol only solves the first issue. But We do need a data catalog for the second issue (data discovery, management & governance).

I think we probably need a truly de-centralized data catalog to free the publisher from many obligations and also enable the general public to get involved at the same time.

To achieve that, I think we probably can:

Only some rough ideas here~ 😄

The other problem is most current blockchain / public ledger system use Proof-of-Work type consensus protocol. It may imply the financial cost (around 1 USD per dataset, according to here ) for any data to be stored on the public ledger.

We might want to try a different type consensus protocol (e.g. a mix of Proof-of-Authority with other types --- no clear idea on this yet) to offer a coin-free (no compulsory cost) network.