Closed MJedr closed 2 years ago
Note that this doesn't completely solve inspirehep/inspirehep#2241 because there might be a duplicate when the URL gets added as a DOI to the reference in case there's already a DOI in the text (see example in the issue).
But how it's possible at this level? We add all the extracted references only to reference
field https://github.com/inspirehep/refextract/blob/8cdb6f1d37b140f3b9bd05b06b52aabaf1463e0c/refextract/references/record.py#L162-L164. And in schemas we don't add duplicated dois in builder https://github.com/inspirehep/inspire-schemas/blob/ce8a2a6dc4d9a360aae5fe9a6fd5d8e0209fac48/inspire_schemas/builders/references.py#L290
That line in the builder is broken. We check if the unnormalized value is present among the DOIs, we should normalize before checking.
Ok then, I'll do the fix in builder too.