internetarchive / fatcat

Perpetual Access To The Scholarly Record
https://guide.fatcat.wiki
Other
114 stars 18 forks source link

match old-style arxiv identifiers in references #84

Open bnewbold opened 3 years ago

bnewbold commented 3 years ago

Via regex? Example strings:

B.A. Dobrescu, hep-ph/9510424.
K.R. Dienes, C. Kolda and J. March-Russell, hep-ph/9610479.
S.P. Martin, hep-ph/9608224.

Example work: https://fatcat.wiki/release/jswbtoqu3vbjhj5nx3t3f4tdqi/refs-out

bnewbold commented 2 years ago

Newer GROBID (0.7.0) still doesn't detect these as arxiv identifiers, though in some cases it does identify them as generic identifiers.