neuroquery / pubget

Collecting papers from PubMed Central and extracting text, metadata and stereotactic coordinates.
https://neuroquery.github.io/pubget/
MIT License
20 stars 12 forks source link

pubget.download_pmcids broken #20

Closed A-Telfer closed 1 year ago

A-Telfer commented 1 year ago

The pubget.download_pmcids function appears to be broken

It's not throwing an error message, but all results are The following PMCID is not available

e.g.

import pubget 
import pandas as pd

pmcids = [19233148, 24567909, 18550622]
article_sets, ret = pubget.download_pmcids(pmcids, 'temp')
pd.read_xml(article_sets/'articleset_00000.xml') # All results are "PMCID is not available"

I've also tried using an api key. The query downloader is working

A-Telfer commented 1 year ago

This is an error on my part, pubmed ids are not the same as pmcids. Using the pmcid worked

(On the pubmed page, you can also see the pmcid)

jeromedockes commented 1 year ago

great, I'm glad you fixed it! After the next release of pubget (or with the development version now) you would get slightly more useful output, because pubget now first filters the list of PMCIDs to keep those in the PMC open access subset. So the relevant part of the log would look like this:

INFO    2022-12-22T14:47:22-0300    pubget._entrez  Posting 3 PMCIDs to Entrez.
INFO    2022-12-22T14:47:23-0300    pubget._entrez  Search returned 0 results
INFO    2022-12-22T14:47:23-0300    pubget._entrez  0 / 3 articles are in PMC Open Access.

and instead of having an xml file containing error messages the articlesets directory would not contain any xml files.