Handle literature mining from preprints and patents

We want to accommodate the new stream of findings we extract from preprints and patents through EPMC's mining pipeline, on top of the existing literature references.

Background

EPMC's new pipeline is submitting to us the results of running their entity recognition algorithm on pre print publications and patents submitted to the relevant national institutions such as the European Patent Office.

These are the numbers of evidence per data type that we will be incorporating in 23.02:

+----------+-------+
|  lit_type|  count|
+----------+-------+
|Literature|5131169|
| Preprints|  97951|
|   Patents|  94120|
+----------+-------+

Note: These additions not only affect EPMC's evidence, but the bibliography widget so that these are IDs will be now queryable.

Tasks

[ ] We need to agree on the way we want to treat this information: potentially we want to differentiate patents from other literature references and convert them into a data source.
[ ] Decide on how we want to score the different info
[ ] Make all the technical changes to accommodate these decisions
[ ] Understand what is the coverage of patents that are being submitted to EPMC.

opentargets / issues

Handle literature mining from preprints and patents #2879

Background

Tasks