mdeff / fma

FMA: A Dataset For Music Analysis
https://arxiv.org/abs/1612.01840
MIT License
2.2k stars 432 forks source link

host on cloud computing provider #26

Closed danmackinlay closed 4 years ago

danmackinlay commented 6 years ago

A suggestion - I notice there are a few open issues about outdated data version, so I presume the hosting of this data is inconvenient to update. As such i might be worth hosting the data somewhere else.

according to the FAQ, Microsoft Research Open Data will host data sets up to 250gb. Amazon ad probably google offer similar schemes.

danmackinlay commented 6 years ago

Amazon's AWS also hosts data sets and has a formal submission procedure for new data sets.

dvolgyes commented 6 years ago

Or maybe on https://zenodo.org/ ? It is a Swiss (CERN based) data repository for scientific data sets, it gives DOI, you can link exisiting publications to it, and it has no space limit. (By default, it is 50GB, but you can contact them by email, and they will lift the limit for the given upload.)

dvolgyes commented 6 years ago

And Zenodo has a simple, usable API.

mdeff commented 4 years ago

Thanks for the suggestions! AWS and Microsoft are potential providers. I like Zenodo, but when I contacted them in May 2017 about hosting the FMA they answered: "Unfortunately the data sizes you mentioned are above of what we can accept." Another option is torrents (#32), though I don't know how convenient that is in general, and how to ensure that there's always one peer up.

The current hosting is not inconvenient to update, but I think that we should strive to update as infrequently as possible. One problem is that published results are only comparable on the same data, so every update makes things more difficult to compare.

I've documented the known issues in the README and in meta-issue #41. Hope that helps for the time being.