Open gbilder opened 10 years ago
Hi Geoff,
thanks for these pointers.
The whitelist was meant to be a temporary workaround to avoid uploading content that comes with a statement of being open but actually is not open. See https://outreach.wikimedia.org/wiki/GLAM/Newsletter/November_2012/Contents/Open_Access_report#Metadata_at_PubMed_Central for some examples.
We definitely plan to switch to harvesting licensing info from CrossRef now that it is becoming increasingly available from publishers, and we'd appreciate pointers as to what the current state of availability is (cf. https://github.com/wpoa/OA-signalling/issues/12#issuecomment-57958023 ).
As for treating different DOI prefixes by the same publisher differently, we are aware of this background, but the XML quality that we have from one publisher is not uniform across all their DOI prefixes, nor is the availability of audio or video content, which was the initial focus of the whitelist.
Using DOI prefixes for whitelist is probably undercounting resources.
For example, note that in the example whitelist you provide only includes a few of the eight possible prefixes that are on Hindawi content:
http://api.crossref.org/members?query=hindawi
Or, more accurately:
http://api.crossref.org/members/98
The issue here is that a publisher may have DOIs with several prefixes. This typically occurs when one publisher acquires title(s) from another publisher or when a title moves from a subscription based publisher to an open access based publisher.
You should probably whitelist at the member identifier level (e.g. 98, above). The member identifier is returned in a DOI query. For example:
http://api.crossref.org/works/10.1155/2014/604157
(Of course, in the above example you are also returned the actual license, CC-BY, as well)