Open ajgabz opened 7 years ago
@ajgabz, thanks for providing that file.
In case anyone wants it, here's a *nix one-liner that will get that header:
head -n1 MetObjects.csv | csvtool col 1- -u TAB - | sed 's/\t/\n/g'
I'd recommend publishing in Table Schema which can be then embedded in a Data Package.
@danfowler @abetusk @ajgabz I took a first run at this: https://github.com/avitalp/metmuseum-oa-explore
Does the Met's internal database store identifiers in controlled vocabulary systems, like the Getty ULAN, TGN, and AAT? If so, it would be beneficial to include these in the CSV output in order to normalize the data into RDF more efficiently and accurately.
How much use would you people get out of putting a SQL file like @avitalp's behind a RESTful API? So you can make GET
requests to it and the like.
I mean, they do seem to have an API ([example(http://www.metmuseum.org/api/Collection/additionalImages?crdId=437853)) but it doesn't have any useful information like a title, artist, etc.
It'd be a fun project for me—just want to know if people would find it useful.
@danfowler instead of Table Schema, isn't it better to use the CSVW standard?
@VladimirAlexiev Either or both would be cool! Also, I think it would be pretty easy to translate to one from the other. I am coming from the project that developed the Table Schema specification, so I would like to try out the dataset with the tools we have that support it.
I'd also like to see this schema maintained / updated as well, along with a sql format (like AvitalP's).
Related useful elements
Thanks everyone for the great discussion here. The data in the CSV (and everywhere else) is mostly from our TMS database. I personally would love to make a replica of this database public at some point, if we could get approval.
According to the README file, this big dataset comes from the Met's own internal database. What's the (presumably) relational schema that's being used?
I strongly feel that making the schema public will allow easier and more efficient navigation of this massive dataset (as opposed to dealing with one big 43-column heterogeneous table).
And to help other users with this massive table, I've attached a list of field names in plain-text format. Each field name is on a separate line. field_names.txt