infolab-csail / wikithingsdb

A DB of Synonyms, Paraphrases, and Hypernyms for all Wiki Things (Articles)
3 stars 3 forks source link

Store data in Swift, not in version control #2

Closed alvaromorales closed 9 years ago

alvaromorales commented 9 years ago

We can use OpenStack's Swift object storage to store data files that don't belong in version control.

HTML ontology classes: https://ceph.csail.mit.edu/swift/v1/infolab/ontology-classes.html Infobox counts: https://ceph.csail.mit.edu/swift/v1/infolab/infoboxes.tsv

These URLs should probably be in some config variable, and should be downloaded locally to a dir in .gitignore (to avoid downloading over and over).

For now these URLs are public, but we can generate temp urls with authentication if needed.

The files should also have a date associated with them (e.g. infoboxes-2015-08-15.tsv).

michaelsilver commented 9 years ago

Cool! Found this documentation; looks like I can set up Python to automatically download these files in the installation process of my module. Will probably need to use secret ENV variables like you did for Elasticstart

alvaromorales commented 9 years ago

Just use the public URLs instead of the Swift client; no need to have full OpenStack access (security reasons).

michaelsilver commented 9 years ago

Ok, thanks. How do you recommend me to upload new files, when needed?

alvaromorales commented 9 years ago

We can get you an OpenStack account, or you can just send me the file and I'll upload it. We can think of automating this later on.

Also, do we really need to upload the DBpedia ontology HTML? Can we just use requests to access the URL and download it?