heuristicus / paper-utils

Utilities for document similarity and reference extraction for research papers
MIT License
0 stars 0 forks source link

Break up references into smaller sub-parts #7

Open heuristicus opened 6 years ago

heuristicus commented 6 years ago

Currently we just get the text of the reference without extracting authors, title, year and so on. It would be nice to have this information for various purposes, not least for checking title matches more accurately.

https://stackoverflow.com/questions/32775063/regex-parsing-citation-issue has a regex which attempts to do this, but only for a single paper, and extending it to the variety of styles that we have is likely to be rather difficult and very brittle.