keyko-io / filecoin-large-clients-onboarding

0 stars 0 forks source link

[DataCap Application] #212

Closed fabriziogianni7 closed 2 years ago

fabriziogianni7 commented 3 years ago

Large Dataset Notary Application

To apply for a DataCap allocation for your dataset, please fill out the following information.

Core Information

Please respond to the questions below in pargraph form, replacing the text saying "Please answer here". Include as much detail as you can in your answer!

Project details

Share a brief history of your project and organization.

The Internet Archive, a 501(c)(3) non-profit, is building a digital library of Internet sites and other cultural artifacts in digital form. Like a physical library, we provide free access to researchers, historians, scholars, the print disabled, and the general public. Our mission is to provide Universal Access to All Knowledge. See more at https://archive.org/about/

This project aims to explore the role of decentralized storage in this long-term mission.

What is the primary source of funding for this project?

We are funded through donations, grants, and by providing web archiving and book digitization services for our partners. 

What other projects/ecosystem stakeholders is this project associated with?

The dataset was compiled in collaboration with The Library of Congress, California Digital Library, University of North Texas Libraries, Internet Archive, George Washington University Libraries, Stanford University Libraries, and the U.S. Government Publishing Office.

Use-case details

Describe the data being stored onto Filecoin

The End-of-Term Web Archive captures and saves U.S. Government websites at the end of presidential administrations. This dataset represents a comprehensive crawl of the .gov domain September 2016 and January 20, 2017, at the end of the Obama Administration and just before the beginning of the Trump Administration.

Where was the data in this dataset sourced from?

Federal Government websites (.gov) in the Legislative, Executive, or Judicial branches of government, and related social media accounts. Also in scope are Federal Government Websites on other domains, such as .mil, .edu, and .com

Can you share a sample of what is in the dataset? A link to a file, an image, a table, etc., are good examples of this.

The dataset contains WARC files containing crawl data (and associated metadata) of the aforementioned sites. Their contents, when opened with a compatible viewer, are similar to https://web.archive.org/web/20170126033350/http:/globalchange.epa.gov/

The raw files look like this: https://archive.org/download/LOC-QUARTERLY-006-20161225070227072-13019-13025-wbgrp-crawl202

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

Yes, data is archived in the public interest. Archive is currently available at http://eotarchive.cdlib.org/search?f1-administration=2016

What is the expected retrieval frequency for this data?

This effort is intended primarily as an archival and exploratory usecase. Data may be accessed by researchers, periodic integrity checks, and interactive use prototypes (similar to Estuary)

For how long do you plan to keep this dataset stored on Filecoin? Will this be a permanent archival or a one-time storage deal?

The dataset is intended for long-term archival storage, depending on the outcomes of this trial.

DataCap allocation plan

In which geographies do you plan on making storage deals?

We're looking for a wide geographic distribution to model global resiliency. Miners in NA and EU geos will initially be considered.

What is your expected data onboarding rate? How many deals can you make in a day, in a week? How much DataCap do you plan on using per day, per week?

We have extensive interconnects to high bandwidth networks and robust processing capacity. Once we get through the testing phase, we expect us to be able to onboard between 50-100TiB/week.

How will you be distributing your data to miners? Is there an offline data transfer process?

Offline data transfer over the internet, using standard HTTP or purose-made protocol like Tachyon.

How do you plan on choosing the miners with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

Miners that are in the right geographies and have high reputation scores on public indices like filrep.io. The initial set of storage providers for testing will likely be from the MinerX Fellowship.

How will you be distributing data and DataCap across miners storing data?

We will likely be structuring our files into 32GiB chunks that will be evenly distributed in deals with the selected set of storage providers.
large-request[bot] commented 3 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

fabriziogianni7 commented 3 years ago

Multisig Notary requested

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

100TiB

large-request[bot] commented 3 years ago

**Multisig created and sent to RKH t01021

large-request[bot] commented 3 years ago

DataCap Allocation requested

Multisig Notary address

t01021

Client address

f1wp6zoxj7sydnrywvzp276x3gayghi7r6le4tcwy

DataCap allocation requested

50TiB

fabriziogianni7 commented 3 years ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecrxvj7kyobhclc6q6vvcuspocczp6vciqiqldsf7icqawnyaydt4

Address

f1wp6zoxj7sydnrywvzp276x3gayghi7r6le4tcwy

Datacap Allocated

50TiB

Signer Address

t1fmqtnifrcnv4753hoyhjalgsv5klimrxmk7ekoq

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecrxvj7kyobhclc6q6vvcuspocczp6vciqiqldsf7icqawnyaydt4

github-actions[bot] commented 2 years ago

This application has not seen any responses in the last 20 days, so for now it is being closed. Please feel free to re-open if this is relevant, or start a new application for DataCap anytime. Thank you!