sorgerlab / indra

INDRA (Integrated Network and Dynamical Reasoning Assembler) is an automated model assembly system interfacing with NLP systems and databases to collect knowledge, and through a process of assembly, produce causal graphs and dynamical models.
http://indra.bio
BSD 2-Clause "Simplified" License
171 stars 65 forks source link

Implement retrieving more than 10k PMIDs and all metadata #1424

Closed bgyori closed 9 months ago

bgyori commented 10 months ago

This PR adds a simple wrapper around PubMed's edirect CLI (https://www.ncbi.nlm.nih.gov/books/NBK179288/) to retreive PMIDs such that we can easily get all PMIDs for queries that return more than 10k results (this turns out to be very convoluted to solve with the REST API even if not entirely impossible).

It also adds a wrapper around a function for getting metadata to allow for a single function call above the limit of 200 PMIDs per metadata request call.