openstate / open-cultuur-data

The back- and front-end code that powers the Open Cultuur Data API
http://opencultuurdata.nl/
28 stars 18 forks source link

pipeline for static files #12

Closed Gijs-Koot closed 10 years ago

Gijs-Koot commented 10 years ago

If the records are received from a static file, it is unclear how

def get_original_object_urls(self):

on an Item should be implemented.

bartdegoede commented 10 years ago

My suggestion would be to hardcode the URL to the static file. I'm not sure if static data that isn't available anywhere online should be incorporated in the API, although this bars the platform from being used as a primary source of data, rather than just as a meta-index.

ajslaghu commented 10 years ago

Yes. It should be Online. Open Culture Data can host a copy, with a reference to the original location of disclosure. We could even register mulitple sites for the data in the harvester. ['statis.ocd.nl',['dropbox.com/employeeX', 'etc']

justinvw commented 10 years ago

Currently I can think of two situations were we would have to deal with 'static files':

  1. An institution publishes it's collection as one or more static files (a CSV, database dump, compressed archive, etc.) that are publicly accessible (for example, via HTTP).
  2. In order to receive a digitally processable copy of a collection, a user has to file a (written) request with an institution. The institution meets the request by providing the user that filed the request with one or more static files (for example, by e-mail or on a DVD).

In case of situation one, I think we should follow @bartdegoede's suggestion: just include the URL to the static file (with the appropriate mime type prefix).

In case of situation two, I think @ajslaghu's suggestion to let OCD host a publicly accessible copy makes sense. When a new datasource is submitted for review, the submitter will have to discuss with the reviewer how the data is transferred to the OCD servers. This process could of course also be automated, if it turns out that many institution provide their data in such a way. Let's hope that this is not the case :smile:.

@Gijs-Koot, do you already have a concrete example of dataset where only static file(s) are provided?

mbrinkerink commented 10 years ago

+1 on @justinvw his suggested solutions, for both scenarios.

justinvw commented 10 years ago

We now use http://static.opencultuurdata.nl for hosting the files discussed in this issue. For an example, have a look at the Centraal Museum Utrecht dataset.