keyko-io / filecoin-large-clients-onboarding

0 stars 0 forks source link

[DataCap Application] #285

Closed fabriziogianni7 closed 2 years ago

fabriziogianni7 commented 2 years ago

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

Our project began with our initial ambition to build [Powergate](https://github.com/textileio/powergate) and integrate it into our hosted platform, [the Hub](https://blog.textile.io/the-textile-hub-joins-filecoin-mainnet/), plus help other users integrate it for using Filecoin. We found that it was difficult to make a high number of deals on the network at competitive rates with minimal error rates. To solve those challenges and enable higher throughput deal-making on the network, we [announced our work on a deal auction layer for Filecoin](https://blog.textile.io/introducing-storage-auctions-filecoin/).

Auctions are an exciting primitive for the network because they inverse much of the deal-making flow. First, clients submit deal proposals as auctions. Next, storage providers bid in real-time for the right to store that deal. Finally, winners are selected according to a simple and open algorithm, and the deals are made. This simple three-step flow dramatically reduces the complexity of deal-making, reduces errors, and increases throughput on the network. Additionally, it's done with minimal impact on storage providers, making it easy to tune their bidding to match their infrastructure capabilities. 

Since launch, our deal auction system has stored 120TiB on the network across 32 storage providers. We've [opened the system metrics](https://textileio.retool.com/embedded/public/fbf59411-760a-4a1a-b5b8-43f42061685d) for all to review in real-time. 

We've made our deals using datacap from a series of smaller applications you can find in [GitHub history](https://github.com/filecoin-project/filecoin-plus-client-onboarding/issues?q=is%3Aissue+is%3Aclosed+author%3Aandrewxhill).

Our vision is to continue pushing our ability to add throughput and stability to the network through deal auctioning. While they are currently in prototype, we believe we can turn them into a decentralized building block on the Filecoin network.   

What is the primary source of funding for this project?

Textile has received both VC investment and foundation grants in the past to help build this and other projects. 

What other projects/ecosystem stakeholders is this project associated with?

Auctions can be used all or in part by other projects. Right now, three clients are at varying stages of onboarding, including a collaborations with web3.storage, Opscientia, and the eth.storage bridge. We've recently released an [auction client](https://github.com/textileio/go-auctions-client) that allows clients to sign their own deals and use their own datacap. Most if not all of the deals made by the listed clients will migrate to using their own datacap over time. In the interim, and whenever we onboard new auctions users, we aim to have datacap available for rapid onboarding while they learn how to use the system. 

Use-case details

Describe the data being stored onto Filecoin

There is varying data being stored through auctions including primarily NFT assets, public and research datasets.

Where was the data in this dataset sourced from?

We don't do the sourcing directly, so the auction clients are doing sourcing through their collaborations and APIs. So far, the largest user of auctions has been the web3.storage team. Others are in the early phases of onboarding to the system.

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

You can use our public metrics dashboard to explore all data being stored. 

* https://textileio.retool.com/embedded/public/fbf59411-760a-4a1a-b5b8-43f42061685d
* Use the provider search at the bottom to view all storage records with any provider. e.g. https://textileio.retool.com/embedded/public/46e74cd2-c47c-42ac-b542-189925795c41#provider=f020378

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

We will only onboard clients to the auctioneer (using our datacap) that have public datasets. In cases where private data are being stored, we will require them to use their own wallet and therefor apply for their own datacap.

What is the expected retrieval frequency for this data?

Varying. Some projects such as Opscientia are actively working to understand the data architecture requirements on Filecoin and begin experimentation with retrieval. 

For how long do you plan to keep this dataset stored on Filecoin?

Varying. Again, this is something that each client of the auctions system will address in their own plans. We will work with them to enable future storage renewals and deal monitoring.

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

We don't favor one geography over another. 

https://textileio.retool.com/embedded/public/fbf59411-760a-4a1a-b5b8-43f42061685d

How will you be distributing your data to storage providers? Is there an offline data transfer process?

There is an offline data transfer, but only because it allows data transfer to be "pull" based from the storage provider's perspective. This means, that when they win an auction they can fetch data on demand. As the online deal flow becomes more flexible, we'll migrate to that API.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

Provider selection:

Providers are selected by a simple algorithm per replica. For example, if you store a file with replication = 3:

1. Select all cheapest bids that match deal requirements (e.g. fast-retrieval).
2. For the first replica, choose the provider from the pool (step 1) with the highest reputation. Reputation is a time-decay function over a week of failures. So the provider that has the least recent deal failures will win.
3. For the second replica, choose the provider that has won an auction the least in a rolling window of one week. This ensures newly active or unlucky providers make it into the winning pool.
4. For the third replica, choose a provider at random.

Any provider can join the bidding pool by running [bidbot](https://github.com/textileio/bidbot).

Data security and retrieval:
* All data is stored in replica (we recommend 5 to clients).
* By default, all deals include fast-retrieval.
* We will explore adding retrieval reputation to future selection algorithms.

How will you be distributing deals across storage providers?

See above. Bidbot plus winner algorithm. 
I believe we are among the most decentralized (in terms of provider choice) clients on the network today. See the first pie-chart here https://textileio.retool.com/embedded/public/fbf59411-760a-4a1a-b5b8-43f42061685d for a sense of our distribution across connected providers.

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

Yes.
large-request[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

fabriziogianni7 commented 2 years ago

Multisig Notary requested

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

45TiB

large-request[bot] commented 2 years ago

**Multisig created and sent to RKH t01019

large-request[bot] commented 2 years ago

DataCap Allocation requested

Multisig Notary address

t01019

Client address

f144zep4gitj73rrujd3jw6iprljicx6vl4wbeavi

DataCap allocation requested

22.5TiB

fabriziogianni7 commented 2 years ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecmmdxlpdhz4vostea5im7dcsd4hq5mqls75nz6st4inxw4zg237y

Address

f144zep4gitj73rrujd3jw6iprljicx6vl4wbeavi

Datacap Allocated

22TiB

Signer Address

t1fmqtnifrcnv4753hoyhjalgsv5klimrxmk7ekoq

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecmmdxlpdhz4vostea5im7dcsd4hq5mqls75nz6st4inxw4zg237y

fabriziogianni7 commented 2 years ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceb5y7pejecbiejnzfeuzvqt7v7z6fhnrz3k3ggjbqdm7kfdjgu7ig

Address

f144zep4gitj73rrujd3jw6iprljicx6vl4wbeavi

Datacap Allocated

22TiB

Signer Address

t1fmqtnifrcnv4753hoyhjalgsv5klimrxmk7ekoq

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceb5y7pejecbiejnzfeuzvqt7v7z6fhnrz3k3ggjbqdm7kfdjgu7ig

fabriziogianni7 commented 2 years ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecuoybukfxm4mkwgvwdrezwvljckci3y6ewftnynsy6klokx7okey

Address

f144zep4gitj73rrujd3jw6iprljicx6vl4wbeavi

Datacap Allocated

22TiB

Signer Address

t1fmqtnifrcnv4753hoyhjalgsv5klimrxmk7ekoq

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecuoybukfxm4mkwgvwdrezwvljckci3y6ewftnynsy6klokx7okey

fabriziogianni7 commented 2 years ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceaxpjlebtuysz55xgrlgokbqjfetrzcmcfwl47hydnxevszoi5pdg

Address

f144zep4gitj73rrujd3jw6iprljicx6vl4wbeavi

Datacap Allocated

22TiB

Signer Address

t1fmqtnifrcnv4753hoyhjalgsv5klimrxmk7ekoq

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceaxpjlebtuysz55xgrlgokbqjfetrzcmcfwl47hydnxevszoi5pdg

github-actions[bot] commented 2 years ago

This application has not seen any responses in the last 20 days, so for now it is being closed. Please feel free to re-open if this is relevant, or start a new application for DataCap anytime. Thank you!