pnb / paper-convert-scripts

0 stars 2 forks source link

Reference checking for difficult cases #7

Open pnb opened 2 years ago

pnb commented 2 years ago

Per #6, there are some tricky cases with reference checking, including ranges in equations and others. The anystyle package used for reference checking may merit improvement as well, since it has some false positives and other misses like confusing journal volumes for page numbers (in which case one thing is still missing, but the wrong thing is identified).

This might require some custom parsing and context checking -- e.g., whether the current brackets are in a MathJax tag and should be ignored.

pnb commented 2 years ago

I thought for a while that this might be specific to DOCX files, but apparently LaTeX papers can have manually added \bibitem references, including mismatched references (e.g., ps9itMnZ4n), so it should be handled for LaTeX as well.

pnb commented 1 year ago

Might try custom training anystyle on the EDM style in particular.

pnb commented 1 year ago

LFfQRMVQeF is one example with [0, 2]

pnb commented 1 year ago

Fixed the in-text cite parts in e484154, which handles basically all the problems with false positives related to numbers in math ranges and other oddities. There are certainly more cases that will crop up, but this works for all accepted papers from last year now at least.

Still does not handle false positives in the anystyle part though.

pnb commented 1 year ago

One more type of in-text false positive in xSIDvwju3D, with [6, 25–27] type of formatting.

pnb commented 9 months ago

Now partially implemented, but only for JEDM papers.