oceanprotocol / market

🧜‍♀️ THE Data Market
https://market.oceanprotocol.com
Apache License 2.0
191 stars 296 forks source link

Data Locker using Estuary #1483

Closed calebtuttle closed 2 years ago

calebtuttle commented 2 years ago

Is your feature request related to a problem? Please describe. Ocean frontend currently does not provide an interface where users can easily upload their data.

Describe the solution you'd like Add a Data Locker page (as described in this proposal) that provides a way for users to easily upload data to IPFS + Filecoin and then publish the data on Ocean.

Implementation. The Data Locker consists, at present, of a page on the frontend where whitelisted users can upload their data, through a drag and drop interface, to IPFS and Filecoin, and where they can view their uploaded data. Uploads to IPFS and Filecoin are done using Estuary.

Flow. After a user drops data in the frontend and clicks “Submit,” the user must sign a message with their web3 wallet, and the data and signature are sent to a proxy server. The proxy server runs validation checks, including a check that the address that signed the message is on a whitelist maintained by the proxy server.

If the checks pass, the server submits the data to Estuary, which then pins the data to IPFS and establishes deals with Filecoin storage miners. On the frontend, the uploaded file(s) are displayed to the user. An additional modification was made to the Publish form: Under every FileInput, there is a dropdown listing the files a user has uploaded, and the user can select the filename to have the field populated with the file's CID.

Describe alternatives you've considered One alternative we considered is building (or requiring users to use) a generic frontend for storing data on IPFS + Filecoin, but this does not provide an easy way for the user to go from uploading data to publishing it on Ocean.

Additional context I have been working on this feature for OpSci Commons, and it is still a work in progress. At the Ocean core tech meeting yesterday (June 1, 2022), a desire for a pull request for this feature was expressed. I'm hoping this issue starts a conversation about which features of a Data Locker would benefit Ocean and about the best way to implement these features.

The Data Locker feature, as described above, is implemented within my fork of Ocean. The feature includes a proxy/access control server, which can be found here.

Open questions/issues:

Below is what the Data Locker looks like on my local Ocean market build. Ocean Data Locker

mihaisc commented 2 years ago

So we need to whitelist people that can upload ? Basically somebody from the market team need to manually add user wallet addresses or am i misunderstanding ?

MantisClone commented 2 years ago

Related: https://github.com/oceanprotocol/market/issues/446

calebtuttle commented 2 years ago

So we need to whitelist people that can upload ? Basically somebody from the market team need to manually add user wallet addresses or am i misunderstanding ?

The current implementation requires that users are whitelisted manually. There are a couple reasons we made this decision for OpSci Commons, but it might be that Ocean needs a different solution.

hebbianloop commented 2 years ago

We whitelist users for upload privileges because archival on filecoin + pinning on IPFS requires quite a bit of overhead. Estuary solves a lot of the low level challenges for preprocessing, deal making, communicating with filecoin network, and running a gateway. However, it is expensive to run an Estuary Node (2-3k/monthly for a standard build).

We are currently developing our solution using an invite-only Estuary API key, so we must roll this out slowly over time. An improvement over this implementation could include an Ocean Market public Estuary node (would require sponsorship from OceanDAO or other stakeholders) or streaming payments for Data providers running nodes to incentivize others to provide the service.

To summarize, deal making is complex and Estuary has solved this. However they run a node for devs to help them build but it cannot be opened up to the general public. Anyone can run an Estuary node and this can be part of Ocean Stack but overhead needs to be considered.

trentmc commented 2 years ago

Agree, for Ocean Market, whitelisting is a no-go. Ocean Market needs to be permissionless. To reconcile this with near-term Estuary constraints, perhaps it could be in an invite-based "beta"?

Also, the spec should be that the data publisher is paying for storage. Not OPF, Opscientia, Estuary, Filecoin, or Protocol Labs. Payment should "just happen" as part of the publish tx. Is that the case?

trentmc commented 2 years ago

Also: I'd nudged for a PR. This is still at the github issue level. But that's ok for now, given that there are still at least a couple issues to solves wrt macro specs.

trentmc commented 2 years ago

The Data Locker feature, as described above, is implemented within my fork of Ocean. The feature includes a proxy/access control server, which can be found here. .. Should the proxy server be folded into Ocean Provider? If so, should it be rewritten in Python, wrapped by Provider, or something else?

The main thing is that it behaves like Provider, ie implement & expose the Provider API. There are a few ways it could be built:

  1. A PR merged into Provider repo, written in Py
  2. A fork of Provider, written in Py
  3. From-scratch code written in any language that implements the Provider API

(1) is best for maintainability. (2) next-best. (3) can work

trentmc commented 2 years ago

A's to the other Q's:

How should directories be handled? Uploaded directories are currently stored as (and would be published as) CAR files, but there might be a better approach.

Start with what you have. Focus on getting a PR merged, meeting the specs as discussed above.

What is the best way to whitelist users so they can upload? Is a whitelist the best approach? We are exploring account verification with https://app.holonym.id/ (see https://docs.holonym.id/) to gate upload access by academic/institutional oauth credentials.

The best way is don't whitelist users.

Is Estuary the best way to get files onto IPFS + Filecoin? Would certain modifications to Estuary be helpful for this feature?

Start with what you have. Focus on getting a PR merged, meeting the specs as discussed above.

kremalicious commented 2 years ago

Creating a file management ability right in the app seems out of scope. The market app is a demonstrator of Ocean core capabilities and it should stay like that. We support http(s) & ipfs URLs and nothing is stopping users from uploading data to IPFS and using this in the File form field (need to double check if we support this in v4 market, but provider supports it). And nothing proposed in here seems in any way simpler to use or implement than direct IPFS API node integration or using https://web3.storage, where the solution proposed in here even requires an editorial flow which in itself would need to be clearly defined, as the foundation is not going to whitelist people based on goodwill or random criteria. But as Trent said, permissionless would be the preferred way.

Everything which goes beyond a button on the File form field to upload to something external and give back an URL (can also be with drag & drop and such) to the app, to me, is just too much functionality as we are simply not a file management or storage solution. So I would suggest having this proposed spec here as a standalone app, which could even be its own product with its own revenue stream.

And as Trent said, on top of that, this would need deep provider integration which then would make Estuary a core Ocean Protocol feature, where Ocean should be rather technology agnostic when it comes to file storage.

kremalicious commented 2 years ago

as mentioned above, closing as out of scope