Open michamos opened 6 years ago
Besides, we would need to have a list of mappings similar to the one in inspirehep/inspire#351 to automatically fix incorrect INSPIRE IDs.
I think we'd want to know about malformed IDs. Missing ones would probably generate too much work at first.
Context
When an author has not been assigned an INSPIRE ID yet, the collaborations put all kinds of placeholders in the field corresponding to the ID, like
None
or???
, or leave it empty.Current Behavior
Because of this , after extracting author information from the authors XML file, the record might be invalid, or some authors might be lacking an ID without us noticing.
Expected Behavior
The invalid authors are ignored, and an RT ticket is created with information about the record and the authors having invalid or missing IDs.
Note
It might make sense to rewrite the authors XML extraction using
parsel
(the library poweringscrapy
XML parsing) and theSignatureBuilder
instead of bolting this behavior on top of the current XSLT+dojson pipeline.cc @hoc3426 @annetteholtkamp