tech-greedy / EasyOnboard-Allocator

0 stars 0 forks source link

[DataCap Application] <DR 9> #3

Open zhaohongwei201109 opened 1 day ago

zhaohongwei201109 commented 1 day ago

DataCap Applicant

Zhao Hongwei

Data Owner Name

LAMOST

Data Owner Country/Region

China

Data Owner Industry

Environment

Website

http://www.lamost.org/dr9/

Social Media Handle

http://www.lamost.org/dr9/

Social Media Type

Other

What is your role related to the dataset

Data Preparer

Total amount of DataCap being requested

20PiB

Expected size of single dataset (one copy)

2.5PiB

Number of replicas to store

8

Weekly allocation of DataCap requested

1000TiB

On-chain address for first allocation

f1tycb3dtgeqiopp2bxit6kgnm3v7zlsqjo7sjq4q

Data Type of Application

Public, Open Dataset (Research/Non-Profit)

Share a brief history of your project and organization

The Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) is a Chinese national scientific research facility operated by the National Astronomical Observatories, Chinese Academy of Sciences. It is a special reflecting Schmidt telescope with 4000 fibers in a field of view of 20 deg2 in the sky. Until July 2019, LAMOST has completed its pilot survey, which was launched in October 2011 and ended in June 2012, and the regular survey of the first seven years, which was initiated on September 2012[1-7]. In this data release, there are totally 10,431,197 low resolution spectra published, which satisfy the selection criteria that the LAMOST LRS General Catalog also used. The data products of this release can be available from the website http://www.lamost.org/dr9/.

Guoshoujing Telescope (the Large Sky Area Multi-Object Fiber Spectroscopic Telescope LAMOST) is a National Major Scientific Project built by the Chinese Academy of Sciences. Funding for the project has been provided by the National Development and Reform Commission. LAMOST is operated and managed by the National Astronomical Observatories, Chinese Academy of Sciences.

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

No response

Describe the data being stored onto Filecoin

1. LAMOST Data includes Three Major Types:
Type (I):
Raw Data: All original data as well as original provenance information (for example, the observing log files, calibration files, software versions used, etc.), and the batch reduced two-dimensional spectra.
Type (II):
1D Spectral Data: One-dimensional spectra of observed objects, reduced through standardized reduction pipelines. Some provenance information is included with the 1D spectra, including the input catalog information, selection criteria and observing information such as exposure time, observation quality, seeing, weather conditions, and so on).
Type (III):
Catalog Data: Objective physical quantities with errors, derived from the spectral data and input catalog. The catalog includes the coordinates, magnitudes, radial velocities, effective temperature, surface gravity, elemental abundances, warning flags and so on.

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here

No response

If you are a data preparer. What is your location (Country/Region)

None

If you are a data preparer, how will the data be prepared? Please include tooling used and technical details?

No response

If you are not preparing the data, who will prepare the data? (Provide name and business)

No response

Has this dataset been stored on the Filecoin network before? If so, please explain and make the case why you would like to store this dataset again to the network. Provide details on preparation and/or SP distribution.

No response

Please share a sample of the data

http://www.lamost.org/dr9/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Sporadic

For how long do you plan to keep this dataset stored on Filecoin

1.5 to 2 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, Africa, North America, South America, Europe, Australia (continent), Antarctica

How will you be distributing your data to storage providers

HTTP or FTP server, Shipping hard drives

How did you find your storage providers

Slack, Filmine, Big Data Exchange, Partners

If you answered "Others" in the previous question, what is the tool or platform you used

No response

Please list the provider IDs and location of the storage providers you will be working with.

f03229932 
f03229933
f03151449
f1315096 
f03055005 
f03055018 
f03055029
f01518369
f01889668
f03151456
f03179572
f03214937
f03178144
f03178077
f01106668 
f0870558 
f03151449
f03151456

How do you plan to make deals to your storage providers

Boost client, Lotus client, Singularity

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

LucasLucas1987 commented 1 day ago

1.Have you prepared enough token for sector pledge? 2.Are you a data preparer? What is your previous experience as a data-preparer? List previous applications and client IDs 3.How will the data be prepared? Please include tooling used and technical details 4.If you are not preparing the data, who will prepare the data? (Name and Business) 5.Has this dataset been stored on Filecoin before? If so, why are you choosing to store it again? 6.Best practice for storing large datasets includes ideally, storing it in 3 or more regions, with 4 or more storage provider operators or owners.You should list Miner ID, Business Entity, Location of sps you will cooperate with.

LucasLucas1987 commented 1 day ago

Why are you applying for 20 PiB of Datacap? Please provide your credentials.

zhaohongwei201109 commented 1 day ago

I am currently in communication with the SPs on the application list, who are located in regions such as the United States, Hong Kong, Shenzhen, Singapore, Japan, and South Korea. They have all prepared sufficient Filecoin collateral. As a data preparer, I mainly use the official tools Boost, Lotus, and Singularity for data preparation. DR4 and DR6 have already been stored multiple times, but to my knowledge, DR9 should be stored for the first time.

zhaohongwei201109 commented 1 day ago
image

According to publicly available data, the latest DR 9 dataset has a spectrum of (10,809,336 + 8,640,738). Based on the compressed data size of 150MB and 8 backups, the DR 9 dataset can apply for 21.74 PiB of Datacap.

The calculation is as follows: (10,809,336 + 8,640,738) 150MB / 32GB 8 backups = 21.74 PiB Datacap.

I am applying for 20 PiB of Datacap with 8 backups, which means each dataset is 2.5 PiB.

LucasLucas1987 commented 1 day ago

Overall, it looks good. We are willing to support 512 TiB in the first round according to the latest rules. Looking forward to your performance!