petermr / pygetpapers

a Python version of getpapers
Apache License 2.0
78 stars 9 forks source link

Unable to download Huge corpus of papers #31

Open UddeshyaPandey opened 2 years ago

UddeshyaPandey commented 2 years ago

Describe the bug Was downloading XML and CSV files for all the papers published in the year 2021 for the query "Transcription factors", the limit was set to 100k papers, and hits were 99k, ideally, it should start the download with a warning but the error is TypeError: 'NoneType' object is not subscriptable

To Reproduce Steps to reproduce the behaviour:

  1. In your windows command prompt type pygetpapers -q "Transcription factors" -x -c -o TF_database_2021 -k 100000 --startdate 2021-01-01 --enddate 2021-12-31
  2. press 'Enter'
  3. Scroll down to the end
  4. See an error like
    TypeError: 'NoneType' object is not subscriptable

Expected behaviour

Ideally, it should start the download of all the available XML and CSV files related to the query

Screenshots image

Desktop (please complete the following information):

Additional context it usually works for a small corpus of like 1000 to 100 papers, for example, pygetpapers ran smoothly the above query for the year 2022 and set the limit to 1000 papers, but the actual hits were only 458. it downloaded a corpus of 458 papers with CSV and XML files. But for a huge corpus usually >1k, it shows the above error message.

ayush4921 commented 2 years ago

Can you check the same command in version 1.1.5

petermr commented 2 years ago

Thanks both, I suggest that 100K is too large a chunk. Maybe 10K

On Wed, Feb 23, 2022 at 3:46 PM Ayush Garg @.***> wrote:

Can you check the same command in version 1.1.5

— Reply to this email directly, view it on GitHub https://github.com/petermr/pygetpapers/issues/31#issuecomment-1048922403, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS4VBAN5LNVD5WFDQNDU4T6N3ANCNFSM5LYO644A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK