ipfs-inactive / archives

[ARCHIVED] Repo to coordinate archival efforts with IPFS
https://awesome.ipfs.io/datasets
183 stars 24 forks source link

modarchive.org #73

Open ghost opened 7 years ago

ghost commented 7 years ago

Hello there! modarchive.org is a 20 year old website that contains, AFAICT, most of the chiptune music on earth in its original formats. They have a redistribution-friendly policy, and a torrent tracker to facilitate that. They release annual updates as separate torrents, the newest as of this writing is the 2015 one.

On a whim, I've decided to grab those torrents and put everything into a single IPFS directory. The torrents are structured two levels of zip files deep; I decided to extract them all and add the files directly, which adds about 50% to the total size. (I'm aware that by doing this there's a fair bit of historic value being thrown away in the form of file creation dates, but on the other hand a few of them were also blatantly faked, so YMMV.)

Here's a plain text index (output of ipfs ls --resolve-type=false $dir | gzip -9 — 6.5MB .txt.gz), which took about 5 minutes to create on its own.

The archive itself is at QmY6G7aYbBYYpJ7LdoerGWQDKd8RA9RNF89mGFXF79L4di which I won't link directly, as it's 61GiB of data in 142465 files in a single directory, liable to give the public gateway aneurysms if search engines start crawling it.

On a side note, this dataset might make for a good stress test: the ipfs add command took about eight hours on fairly good hardware, and peeking inside the directory with the wrong command results in my ipfs daemon hanging for ages and/or OOMing.

sgulgas commented 7 years ago

:+1: Cool stuff, always loved the Mod Archive.

Out of curiosity (i'm having a hard time getting the index and the IPFS hash working for some reason) Does this also include the instrument zip packs that were on the site? I think i have at least one, if not all.

ghost commented 7 years ago

Nope, this is just a flat list of the modules. I guess I could try putting the instruments into ipfs, their torrents seem a lot more in demand than the module ones.

Weird that the index link doesn't work for you, ipfs dht findprovs QmcYo2... gives me plenty of peers. The latter not having providers doesn't surprise me, due to the sheer size. Try ipfs swarm connect QmQZic978SS1i35BckSLzzKotXdRTac4LNcwQ1SxTc6HZe and it should show up. (I'm already seeding the torrents from that machine so a bit more load shouldn't hurt it)

ghost commented 7 years ago

Alright, after a bit of poking I've got a version broken up into subfolders by file extension. Not hyperlinking it again, for reasons mentioned above, but this one seems way more reliable:

/ipfs/QmUCVqYNznwxFp4kyCxzyQ9U12uuFJDsLgFMAr5vUih87L

...even with the entire thing on a local disk:

# original:
~ $ time ipfs ls QmY6G7aYbBYYpJ7LdoerGWQDKd8RA9RNF89mGFXF79L4di > /dev/null
real    4m23.944s
user    0m0.843s
sys     0m0.031s

# xm folder (largest by filesize; 37GB in 43123 files):
~ $ time ipfs ls QmTgKbJoo1KLSVc35MgRQEePA75sJB4c6gHbL8QHe6mQnx > /dev/null
real    0m7.543s
user    0m0.260s
sys     0m0.008s