Open dbietila opened 4 years ago
Some reports of records with DOIs are in this Box folder: https://uchicago.box.com/s/0ndf9r5699x9kc6x3t9gyq74ots6qlw8
What kind of index should this be? That is, it seems like an exactly string match on the DOI itself. No text tokenizing or anything, just an exact string match on something like:
10.1007/978-981-10-6026-7
Does that seem correct?
That makes sense to me. Matt and Keith are working on an urgent SFX issue, but I'll ask them to review this when they are available.
I made DOI number searchable in vufind. Test code is on antares. You can search with DOI number or whole URL.
Create an index of DOIs in VuFind. DOIs occur in some 856 fields, and in 024 fields.
We are interested in indexing 024|a fields for 024 cases where 024|2 is equal to ‘doi’. We also need to index 856|u fields that contain valid DOIs.
Some records may contain multiple 856|u’s with valid DOIs. Ex: 8883838 . We should index each DOI in this case.
The standard syntax for DOIs can be found in Bib # 2352930, and the value is http://dx.doi.org/10.1787/16812328. In this case, we can still strip the string http://dx.doi.org/. Only the portion starting with “10.*” is needed to retrieve the material.
DOIs in the 856|u can occur in a variety of non-standard syntaxes. Bib # 11761529 has an 856|u with the value http://link.springer.com/10.1007/978-981-10-6026-7 . In this case, 10.1007/978-981-10-6026-7 is the meaningful DOI value.
Bib # 9130371 has an 856|u of http://onlinelibrary.wiley.com/book/10.1029/GM093 . This can be trimmed to 10.1029/GM093
There are regular expressions for filtering valid DOIs available here: https://www.crossref.org/blog/dois-and-matching-regular-expressions/
We should use a Solr analyzer to similarly trim search terms that are directed to this index.