petermr / pygetpapers

a Python version of getpapers
Apache License 2.0
78 stars 9 forks source link

ResponseWrapper error (from EPMC) #48

Open petermr opened 5 months ago

petermr commented 5 months ago

THIS IS A GOOD EXAMPLE OF HOW TO REPORT A BUG

Describe the bug Query in pygetpapers in pygetpapers_literature_search.ipynb from Renu Kumari https://colab.research.google.com/drive/1-vM3BKV7NjvFXAdLGuqyNMh4VhPq6uMa?usp=sharing

To Reproduce Steps to reproduce the behavior:

  1. Go to https://colab.research.google.com/drive/1-vM3BKV7NjvFXAdLGuqyNMh4VhPq6uMa?usp=sharing
  2. Launch first 2 cells to install/help
  3. Run cell 3
    !pygetpapers -n -q "carbon emission"
  4. Produces
    Traceback (most recent call last):
    File "/usr/local/bin/pygetpapers", line 8, in <module>
    sys.exit(main())
    File "/usr/local/lib/python3.10/dist-packages/pygetpapers/pygetpapers.py", line 537, in main
    callpygetpapers.create_argparser()
    File "/usr/local/lib/python3.10/dist-packages/pygetpapers/pygetpapers.py", line 530, in create_argparser
    self.runs_pygetpapers_for_given_args(self.query_namespace)
    File "/usr/local/lib/python3.10/dist-packages/pygetpapers/pygetpapers.py", line 312, in runs_pygetpapers_for_given_args
    api_handler.check_query_logic_and_run()
    File "/usr/local/lib/python3.10/dist-packages/pygetpapers/pygetpapers.py", line 183, in check_query_logic_and_run
    self.api.noexecute(self.query_namespace)
    File "/usr/local/lib/python3.10/dist-packages/pygetpapers/repository/europe_pmc.py", line 266, in noexecute
    totalhits = result[RESPONSE_WRAPPER][HITCOUNT]
    KeyError: 'responseWrapper'

Expected behavior I expected message that 10 hits had been found

Desktop (please complete the following information):

Additional context This might be an error at EPMC - I will retry and also read their mailing list

petermr commented 5 months ago

Ran query immediately afterwards on EPMC. Got expected output.


Preprints (1,150)

Books & documents (1)

Date

2024 (924)

2023 (4,194)

2022 (3,963)
[China's wetland soil organic carbon pool: New estimation on pool size, change, and trajectory](https://europepmc.org/article/AGR/IND608163566)
[Ren Y](https://europepmc.org/search?query=AUTH%3A%22Ren%20Y%22), [Mao D](https://europepmc.org/search?query=AUTH%3A%22Mao%20D%22), [Wang Z](https://europepmc.org/search?query=AUTH%3A%22Wang%20Z%22), [Yu Z](https://europepmc.org/search?query=AUTH%3A%22Yu%20Z%22), [Xu X](https://europepmc.org/search?query=AUTH%3A%22Xu%20X%22), [Huang Y](https://europepmc.org/search?query=AUTH%3A%22Huang%20Y%22), [Xi Y](https://europepmc.org/search?query=AUTH%3A%22Xi%20Y%22), [Luo L](https://europepmc.org/search?query=AUTH%3A%22Luo%20L%22), [Jia M](https://europepmc.org/search?query=AUTH%3A%22Jia%20M%22), [Song K](https://europepmc.org/search?query=AUTH%3A%22Song%20K%22), [Li X](https://europepmc.org/search?query=AUTH%3A%22Li%20X%22)
petermr commented 5 months ago

Reran on local commandline without problems:

 pm286macbook-2:~ pm286$ pygetpapers -q "carbon emissions" -k 10 -o junk -p
/opt/anaconda3/lib/python3.8/site-packages/requests/__init__.py:87: RequestsDependencyWarning: urllib3 (2.2.1) or chardet (5.2.0) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
INFO: Total Hits are 84203
10it [00:00, 41859.32it/s]
  0%|                                                    | 0/10 [00:00<?, ?it/s]INFO: Wrote the pdf file for PMC10907782
 10%|████▍                                       | 1/10 [00:01<00:10,  1.19s/it]INFO: Wrote the pdf file for PMC10956835
 20%|████████▊                                   | 2/10 [00:02<00:09,  1.17s/it]INFO: Wrote the pdf file for PMC10928123
INFO: Wrote the pdf file for PMC10928123
 30%|█████████████▏                              | 3/10 [00:26<01:20, 11.45s/it]INFO: Wrote the pdf file for PMC10925036
 40%|█████████████████▌                          | 4/10 [00:28<00:46,  7.77s/it]INFO: Wrote the pdf file for PMC10867074
INFO: Wrote the pdf file for PMC10867074
 50%|██████████████████████                      | 5/10 [00:32<00:32,  6.52s/it]INFO: Wrote the pdf file for PMC10655991
 60%|██████████████████████████▍                 | 6/10 [00:33<00:18,  4.74s/it]INFO: Wrote the pdf file for PMC10907649
 70%|██████████████████████████████▊             | 7/10 [00:34<00:10,  3.57s/it]INFO: Wrote the pdf file for PMC10623158
 80%|███████████████████████████████████▏        | 8/10 [00:38<00:06,  3.49s/it]INFO: Wrote the pdf file for PMC10883542
 90%|███████████████████████████████████████▌    | 9/10 [00:39<00:02,  2.94s/it]INFO: Wrote the pdf file for PMC10902235
100%|███████████████████████████████████████████| 10/10 [00:41<00:00,  4.12s/it]
(base) pm286macbook-2:~ pm286$ tree junk
junk
├── PMC10623158
│   ├── eupmc_result.json
│   └── fulltext.pdf
├── PMC10655991
│   ├── eupmc_result.json
│   └── fulltext.pdf
├── PMC10867074
│   ├── eupmc_result.json
│   └── fulltext.pdf
├── PMC10883542
│   ├── eupmc_result.json
│   └── fulltext.pdf
├── PMC10902235
│   ├── eupmc_result.json
│   └── fulltext.pdf
├── PMC10907649
│   ├── eupmc_result.json
│   └── fulltext.pdf
├── PMC10907782
│   ├── eupmc_result.json
│   └── fulltext.pdf
├── PMC10925036
│   ├── eupmc_result.json
│   └── fulltext.pdf
├── PMC10928123
│   ├── eupmc_result.json
│   └── fulltext.pdf
├── PMC10956835
│   ├── eupmc_result.json
│   └── fulltext.pdf
└── eupmc_results.json
petermr commented 5 months ago

Reran the Colab notebook which worked satisfactorily , so this may be a transient problem.

### **Step 4: Downloading pdf only for 10 papers on the query.**

*   **-k** for desired number of papers to work on
*   **-q** for query term
*   **-o** for output dir
*   **-p** for pdf download

!pygetpapers -q "carbon emission" -k 10 -o "carbon_emi" -p

pm286macbook-2:~ pm286$ pygetpapers -q "carbon emissions" -k 10 -o junk -p /opt/anaconda3/lib/python3.8/site-packages/requests/init.py:87: RequestsDependencyWarning: urllib3 (2.2.1) or chardet (5.2.0) doesn't match a supported version! warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported " INFO: Total Hits are 84203 10it [00:00, 41859.32it/s] 0%| | 0/10 [00:00<?, ?it/s]INFO: Wrote the pdf file for PMC10907782 10%|████▍ | 1/10 [00:01<00:10, 1.19s/it]INFO: Wrote the pdf file for PMC10956835 20%|████████▊ | 2/10 [00:02<00:09, 1.17s/it]INFO: Wrote the pdf file for PMC10928123 INFO: Wrote the pdf file for PMC10928123 30%|█████████████▏ | 3/10 [00:26<01:20, 11.45s/it]INFO: Wrote the pdf file for PMC10925036 40%|█████████████████▌ | 4/10 [00:28<00:46, 7.77s/it]INFO: Wrote the pdf file for PMC10867074 INFO: Wrote the pdf file for PMC10867074 50%|██████████████████████ | 5/10 [00:32<00:32, 6.52s/it]INFO: Wrote the pdf file for PMC10655991 60%|██████████████████████████▍ | 6/10 [00:33<00:18, 4.74s/it]INFO: Wrote the pdf file for PMC10907649 70%|██████████████████████████████▊ | 7/10 [00:34<00:10, 3.57s/it]INFO: Wrote the pdf file for PMC10623158 80%|███████████████████████████████████▏ | 8/10 [00:38<00:06, 3.49s/it]INFO: Wrote the pdf file for PMC10883542 90%|███████████████████████████████████████▌ | 9/10 [00:39<00:02, 2.94s/it]INFO: Wrote the pdf file for PMC10902235 100%|███████████████████████████████████████████| 10/10 [00:41<00:00, 4.12s/it] (base) pm286macbook-2:~ pm286$ tree junk junk ├── PMC10623158 │   ├── eupmc_result.json │   └── fulltext.pdf ├── PMC10655991 │   ├── eupmc_result.json │   └── fulltext.pdf ├── PMC10867074 │   ├── eupmc_result.json │   └── fulltext.pdf ├── PMC10883542 │   ├── eupmc_result.json │   └── fulltext.pdf ├── PMC10902235 │   ├── eupmc_result.json │   └── fulltext.pdf ├── PMC10907649 │   ├── eupmc_result.json │   └── fulltext.pdf ├── PMC10907782 │   ├── eupmc_result.json │   └── fulltext.pdf ├── PMC10925036 │   ├── eupmc_result.json │   └── fulltext.pdf ├── PMC10928123 │   ├── eupmc_result.json │   └── fulltext.pdf ├── PMC10956835 │   ├── eupmc_result.json │   └── fulltext.pdf └── eupmc_results.json

petermr commented 5 months ago

This is generated because RESPONSE_WRAPPER is missing/None. If get(RESPONSE_WRAPPER) is used it would return None and this could be tested rather than trapping the exception.

I hope that this is a transient error generated from EPMC.