ipfs-inactive / archives

[ARCHIVED] Repo to coordinate archival efforts with IPFS
https://awesome.ipfs.io/datasets
183 stars 24 forks source link

Project Gutenberg #29

Open davidar opened 9 years ago

davidar commented 9 years ago

The first thing I mirrored to IPFS was a small subset of Project Gutenberg, so I'm definitely interested in getting the whole thing into IPFS, as both @rht (#14) and @simonv3 (https://github.com/simonv3/ipfs-gutenberg) have suggested.

Making an issue to coordinate this.

rht commented 9 years ago

This is just an rsync away, really. Currently running it on pollux.

davidar commented 9 years ago

@rht is there enough free disk space on Pollux?

rht commented 9 years ago

(didn't check)

rht commented 9 years ago

https://www.gutenberg.org/wiki/Gutenberg:Mirroring_How-To says it is at least 650 GB (could have been doubled). Pollux has 13 GB left.

But anyway, the mirroring is a one-liner.

simonv3 commented 9 years ago

@rht Yeah, what makes this difficult is the amount of disk space - I don't think many people have that amount of space lying around for this.

It's been suggested by some people to shard the collection and just make sure people hosting those bits keep their things in sync independently. There's also been talk about this tool: https://github.com/ipfs/notes/issues/58

simonv3 commented 9 years ago

We could also just pitch in x amount for an Amazon instance (or some other host) of that amount, and just pay that?

Or I could see if I can figure out my raspberry pi, and attach a TB to it.

rht commented 9 years ago

Hmm, rsync doesn't have seek so at least the first 'download -> hash' needs the TB storage to contain it.

Either

  1. https://aws.amazon.com/s3/reduced-redundancy/ ~$24/month.
  2. http://www.amazon.com/Green-1TB-Desktop-Hard-Drive/dp/B006GDVREI ~$50 (can be repurposed for other archivals, once the PG hash has been sharded).

For now, to do partial backup, ipfs object get can be used for each of the nodelinks that form parts of the root hash.

rht commented 9 years ago

(and both storage came from amazon)

rht commented 9 years ago

ipfs check-redundancy $hash would be useful.

davidar commented 9 years ago

@jbenet @lgierth SEND MORE DISKS...

Also see ipfs/infrastructure#89

davidar commented 9 years ago

ipfs check-redundancy $hash would be useful.

@rht Yeah, what I really want to do is have a "click to pin" button on the archive homepage, people select how much storage they want to donate, and the tool randomly selects an appropriate subset of the least-redundant blocks and pins them to the local daemon.

CC: @whyrusleeping

Edit: see ipfs/notes#54

whyrusleeping commented 9 years ago

that would be cool. could have our service enumerate providers for each block under a given archive root, then assign blocks with the least number of providers to the next person who requests.

rht commented 9 years ago

Should be normalized based on the blocks demand curve.

jbenet commented 9 years ago
jbenet commented 9 years ago

We can get more storage nodes, if necessary