neuroquery / pubget

Collecting papers from PubMed Central and extracting text, metadata and stereotactic coordinates.
https://neuroquery.github.io/pubget/
MIT License
20 stars 12 forks source link

Search results mismatch #27

Closed csiyer closed 1 year ago

csiyer commented 1 year ago

Hi there! I just started using pubget for a project and I'm really enjoying this nice tool that you've created--thank you.

In testing the output I'm getting, I noticed that pubget's output doesn't match what I get when I go to PMC Advanced Search and use the exact same query. I'm wondering if there is some reason why this would be the case, or maybe I'm making a mistake on my end.

Example (should be reproducible...) Query: (("2013"[Publication Date] : "3000"[Publication Date]) AND (stop-signal[Abstract] OR stop signal[Abstract])) Command: pubget run . -q '(("2013"[Publication Date] : "3000"[Publication Date]) AND (stop-signal[Abstract] OR stop signal[Abstract]))' Search returns 487 articles (from n_articles in info.json) PMC search with the same query returns 769 articles

I've also tried putting the query in a txt file and using pubget run . --query-file query.txt

If there is some way to resolve the mismatch, I'd love to know. If not, then at least some sense of what contributes to an article being included vs. excluded would be helpful. Thanks!

jeromedockes commented 1 year ago

Hi! thanks for using pubget and for reporting this! This is due to the fact pubget only considers "Open Access" articles within PubMed Central. indeed, some articles are in PMC but their publisher forbids downloading the full text. This is mentioned in the documentation but that information is buried in a rather long paragraph, I'll put it in a more visible place.

If I search PMC with your query, I see 771 articles. If I then click "Open access" on the "article attributes" list on the left, this gets filtered down to 489 articles

csiyer commented 1 year ago

Ah thanks so much—I overlooked that. Appreciate it!

jeromedockes commented 1 year ago

I added a note in the documentation about this. Thanks again!