slidewinder / open_access_figures

Massive scale slide generation from Open Access figures
1 stars 0 forks source link

datacite API #1

Open blahah opened 8 years ago

blahah commented 8 years ago

The datacite API has metadata for a large number of figure-like datasets. For example, a search for files with MIMEtype image/png returns 97,991 records:

http://search.datacite.org/api?q=format:image/png

However, the records don't link directly to the data files. They provide metadata including the DOI, which can be used to resolve a landing page for each record, and in general these link directly to the data files.

blahah commented 8 years ago

Some success by combining curl and bo:

get the redirect url:

$ curl --silent -I http://dx.doi.org/10.5880/GFZ.LIS.2015.001 | grep 'Location' | cut -d' ' -f2
http://pmd.gfz-potsdam.de/panmetaworks/showshort.php?id=escidoc:1160009

find the pngs:

curl --silent -L http://dx.doi.org/10.5880/GFZ.LIS.2015.001 | bin/bo -a href 'a[href$="png"]'
schematic-overrview-on-permafrost-landscapes_snow.PNG
schematic-overrview-on-permafrost-landscapes_fluxes.png
schematic-overrview-on-permafrost-landscapes.png

should allow us to construct the final URLs, e.g.:

http://pmd.gfz-potsdam.de/panmetaworks/schematic-overrview-on-permafrost-landscapes_fluxes.png
blahah commented 8 years ago

But actually, it doesn't! those URLs don't work - there are other URLs that link to the actual PNGs but don't have .png extension. grrr....

blahah commented 8 years ago

So it looks like this is off the table... which is a huge shame.