paperscape / paperscape-mapclient

Browser client for the Paperscape map
http://paperscape.org
MIT License
101 stars 6 forks source link

extend to biorxiv #5

Open yannickspill opened 6 years ago

yannickspill commented 6 years ago

How hard would it be to extend this work to data on biorXiv, the archive for biology? And, even better, how hard would it be to connect the biorxiv papers to the arXiv ones?

rknegjens commented 6 years ago

The hardest part would be building the citation network. We did this for the arXiv by parsing the source files (mostly TeX) for their references and then attaining a high match rate between this reference info and the referred to arXiv papers. In the many cases where no arXiv or DOI identifiers are available the next best thing is the journal information, which requires specialized regex to parse well. High energy physics has the best representation in Paperscape partly because we were both working in the field at the time and were most familiar with its journals (but also because arXiv usage and referencing is generally better in this field).

Generating a paperscape-like map from another citation network is relatively straight forward,

sfrosenb commented 5 years ago

Hey I would help with the extension to biorxiv if you are interested. I am a CS PhD student but I work with computational models of ecological systems so I might be able to help for the same reason that high energy physics has the best representation in Paperscape

dpgeorge commented 5 years ago

@sfrosenb if you want to help with biorxiv that would be great. But note that the maintainers of this project (@rknegjens and myself) are mostly busy with other things now so won't have much time to help out here.

As mentioned above, the main thing to do is to extract the citation network from biorxiv. Do they provide such a thing already? Do they provide source code or downloadable forms of their paper database?