thoth-pub / thoth

Metadata management and dissemination system for Open Access books
https://thoth.pub
Apache License 2.0
44 stars 8 forks source link

Dissemination: Handling of DOIs from registration agencies other than Crossref #588

Open tosteiner opened 3 months ago

tosteiner commented 3 months ago

Describe the bug Thoth's auto-dissemination workflow currently assumes all DOIs are eligible for submission to / update via Crossref.

As we've learned recently, there are some cases where publishers have existing DOIs that have previously been registered with DataCite (in this case: punctum's older books), and when the auto-dissem workflow now sends these on to Crossref for updating, Crossref throws an error because they don't recognise the DOI prefix.

For now, a pragmatic fix proposed by @rhigman would be to list all books with that specific DOI prefix in a special EXCLUDE rule to tell the Crossref auto-dissem workflow to omit those records.

In the future, we might have more publishers with legacy books registered with DataCite, so it might be pertinent to consider a more pragmatic approach, e.g. to include DOI prefixes in the publisher-level metadata, which would then enable us to tailor the dissemination workflow to a particular set of DOI prefixes.

tosteiner commented 3 months ago

(also tagging @hannahhillen :) )

rhigman commented 3 months ago

To be decided whether this requires fixing/mitigating at the thoth end or the thoth-dissemination end (or both). The thoth-dissemination code could be made more defensive, given that very few checks are made by Crossref at the time of submission (the error is only raised later in a report via email).

For example, there could be an initial check of the DOI prefix against the Crossref Get Prefix Publisher endpoint (https://doi.crossref.org/getPrefixPublisher/?prefix=[prefix]) - although note that the docs mark this as "legacy".

ja573 commented 3 months ago

https://api.crossref.org/swagger-ui/index.html#/Prefixes/get_prefixes_prefix

tosteiner commented 2 months ago

Side note: this also raises the more fundamental question around exploring DataCite membership (and potential sponsorship structure similar to Crossref's) - but this seems not urgent at this stage