can I raise a flag / create ticket / send email ... if there is something fishy when processing an xml in a spider?
I.e. chances are everything is OK, but a cataloger should have a closer look.
I don't want to crash it.
Problem
<author><keyname>:</keyname></author>
comes in at least 2 varieties:
everything before is a list of collaborations (if there are no affiliations)
the 'author' before is a collaboration (if there are affiliations)
I can fix the spider to deal with both cases.
But I have no idea whether there are (will be) other cases.
And it's impossible to spot the name of a collaboration amongst several hundred authors if it is misidentified as author.
Therefore I would like to get a warning for records with author names ":"
Any other good idea is welcome.
I even take bad ideas.
Question
can I raise a flag / create ticket / send email ... if there is something fishy when processing an xml in a spider? I.e. chances are everything is OK, but a cataloger should have a closer look. I don't want to crash it.
Problem
<author><keyname>:</keyname></author>
comes in at least 2 varieties:I can fix the spider to deal with both cases.
But I have no idea whether there are (will be) other cases. And it's impossible to spot the name of a collaboration amongst several hundred authors if it is misidentified as author. Therefore I would like to get a warning for records with author names ":"
Any other good idea is welcome. I even take bad ideas.
Example 1
http://export.arxiv.org/oai2?verb=GetRecord&identifier=oai:arXiv.org:1808.04927&metadataPrefix=arXiv
Example 2
http://export.arxiv.org/oai2?verb=GetRecord&identifier=oai:arXiv.org:1607.01177&metadataPrefix=arXiv