neuroquery / pubget

Collecting papers from PubMed Central and extracting text, metadata and stereotactic coordinates.
https://neuroquery.github.io/pubget/
MIT License
20 stars 12 forks source link

add ability to keep words that are linked? #32

Open koudyk opened 1 year ago

koudyk commented 1 year ago

Would it make things very slow if we kept words that have links? so we have the de-linked text instead of "???" ?

This would be useful for matching a) meta-analyzed papers cited in tables (where the words are sometimes linked to the references, to b) the references.

Often, the relevant column containing the meta-analyzed papers contains a lot of "???". From looking at the original papers, it looks like the ???'s are where there are links to citations.

jeromedockes commented 1 year ago

I'll have a look! these "???" seem to be inserted by docbook when it encounters broken cross-references but it may be something else, too did you find them only in tables or in text as well?

koudyk commented 1 year ago

I think that in the text, the linked text is just removed. Like, there will be a citation like this: "[ ]". I think I've only seen the "???" in tables