lukasschwab / arxiv.py

Python wrapper for the arXiv API
MIT License
1.11k stars 123 forks source link

arxiv.Search returns empty result #119

Closed lilycyf closed 1 year ago

lilycyf commented 1 year ago

Description

I'm only able to get a empty result returned simply running, wondering why:

import arxiv

search = arxiv.Search(
  query = "quantum",
  max_results = 10,
  sort_by = arxiv.SortCriterion.SubmittedDate
)

for result in search.results():
  print(result.title)

but, I'm able to get result with the following code:

search = arxiv.Search(id_list=["1605.08386v1"])
paper = next(search.results())
print(paper.title)

Versions

lilycyf commented 1 year ago

arxiv.py version: 1.4.7 works for me

lukasschwab commented 1 year ago

@lilycyf thanks for the bug report. I can reproduce. Looking into it!

lukasschwab commented 1 year ago

@lilycyf my first guess here is actually that the underlying arXiv API is misbehaving.

Yesterday the test suite passed (locally and in CI) for the commit that'd become 1.4.8. Today the test suite fails for the same commit:

~/Pr/arxiv.py(3d013ab) » make test
pytest
===================================== test session starts ======================================
platform darwin -- Python 3.10.4, pytest-7.3.1, pluggy-1.0.0 -- /Users/lukas/.pyenv/versions/3.10.4/bin/python3.10
cachedir: .pytest_cache
rootdir: /Users/lukas/Programming/arxiv.py
configfile: setup.cfg
collected 24 items

tests/test_api_bugs.py::TestClient::test_missing_title PASSED                            [  4%]
tests/test_client.py::TestClient::test_invalid_format_id PASSED                          [  8%]
tests/test_client.py::TestClient::test_invalid_id PASSED                                 [ 12%]
tests/test_client.py::TestClient::test_max_results FAILED                                [ 16%]
tests/test_client.py::TestClient::test_no_duplicates PASSED                              [ 20%]
tests/test_client.py::TestClient::test_nonexistent_id_in_list PASSED                     [ 25%]
tests/test_client.py::TestClient::test_offset PASSED                                     [ 29%]
tests/test_client.py::TestClient::test_query_page_count FAILED                           [ 33%]
tests/test_client.py::TestClient::test_retry PASSED                                      [ 37%]
tests/test_client.py::TestClient::test_search_results_offset PASSED                      [ 41%]
tests/test_client.py::TestClient::test_sleep_between_errors PASSED                       [ 45%]
tests/test_client.py::TestClient::test_sleep_elapsed PASSED                              [ 50%]
tests/test_client.py::TestClient::test_sleep_multiple_requests PASSED                    [ 54%]
tests/test_client.py::TestClient::test_sleep_standard PASSED                             [ 58%]
tests/test_client.py::TestClient::test_sleep_zero_delay PASSED                           [ 62%]
tests/test_download.py::TestDownload::test_download_from_query PASSED                    [ 66%]
tests/test_download.py::TestDownload::test_download_tarfile_from_query PASSED            [ 70%]
tests/test_download.py::TestDownload::test_download_with_custom_slugify_from_query PASSED [ 75%]
tests/test_result.py::TestResult::test_eq PASSED                                         [ 79%]
tests/test_result.py::TestResult::test_from_feed_entry FAILED                            [ 83%]
tests/test_result.py::TestResult::test_get_short_id PASSED                               [ 87%]
tests/test_result.py::TestResult::test_legacy_ids PASSED                                 [ 91%]
tests/test_result.py::TestResult::test_result_shape FAILED                               [ 95%]
tests/test_result.py::TestResult::test_to_datetime PASSED                                [100%]

The test suite fails in the same way if I run it for tagged version 1.4.7:

~/Pr/arxiv.py(3d013ab) » git checkout 1.4.7                                                 2 ↵
Previous HEAD position was 3d013ab Simplify `pdoc` build, eliminate nav badges (#115)
HEAD is now at 1df844f Indicate Python version in trove classifiers (#112)
~/Pr/arxiv.py(1df844f) » make test
pytest
===================================== test session starts ======================================
platform darwin -- Python 3.10.4, pytest-7.3.1, pluggy-1.0.0 -- /Users/lukas/.pyenv/versions/3.10.4/bin/python3.10
cachedir: .pytest_cache
rootdir: /Users/lukas/Programming/arxiv.py
configfile: setup.cfg
collected 24 items

tests/test_api_bugs.py::TestClient::test_missing_title PASSED                            [  4%]
tests/test_client.py::TestClient::test_invalid_format_id PASSED                          [  8%]
tests/test_client.py::TestClient::test_invalid_id PASSED                                 [ 12%]
tests/test_client.py::TestClient::test_max_results FAILED                                [ 16%]
tests/test_client.py::TestClient::test_no_duplicates PASSED                              [ 20%]
tests/test_client.py::TestClient::test_nonexistent_id_in_list PASSED                     [ 25%]
tests/test_client.py::TestClient::test_offset PASSED                                     [ 29%]
tests/test_client.py::TestClient::test_query_page_count FAILED                           [ 33%]
tests/test_client.py::TestClient::test_retry PASSED                                      [ 37%]
tests/test_client.py::TestClient::test_search_results_offset PASSED                      [ 41%]
tests/test_client.py::TestClient::test_sleep_between_errors PASSED                       [ 45%]
tests/test_client.py::TestClient::test_sleep_elapsed PASSED                              [ 50%]
tests/test_client.py::TestClient::test_sleep_multiple_requests PASSED                    [ 54%]
tests/test_client.py::TestClient::test_sleep_standard PASSED                             [ 58%]
tests/test_client.py::TestClient::test_sleep_zero_delay PASSED                           [ 62%]
tests/test_download.py::TestDownload::test_download_from_query PASSED                    [ 66%]
tests/test_download.py::TestDownload::test_download_tarfile_from_query PASSED            [ 70%]
tests/test_download.py::TestDownload::test_download_with_custom_slugify_from_query PASSED [ 75%]
tests/test_result.py::TestResult::test_eq PASSED                                         [ 79%]
tests/test_result.py::TestResult::test_from_feed_entry FAILED                            [ 83%]
tests/test_result.py::TestResult::test_get_short_id PASSED                               [ 87%]
tests/test_result.py::TestResult::test_legacy_ids PASSED                                 [ 91%]
tests/test_result.py::TestResult::test_result_shape FAILED                               [ 95%]
tests/test_result.py::TestResult::test_to_datetime PASSED                                [100%]

arxiv.py version: 1.4.7 works for me

It does not work for me. Mind sharing some more details on how you tested this? Thanks!

lukasschwab commented 1 year ago

@lilycyf while investigating, the test suite started passing again. The example code in your initial issue also works. Version 1.4.8 should work as well for you as 1.4.7.

Thanks again for reporting — seems this was a brief issue on arXiv's side.

Animadversio commented 10 months ago

To add my observation here, it seems to me that the same search code can sometimes work reliably and sometimes yield empty results for some time (5-10mins) and then recover again. This has repeated 3-4 rounds tonight on my side.

It seems like the arxiv api server breaks down or does not answer requests in some periods....

AllenWrong commented 10 months ago

I also have this issue. But, when I debugging my code, I found that I can not enter the function 'self._result' from https://github.com/lukasschwab/arxiv.py/blob/8ef0759f870b563a2ef9fff3cf5b13ef437dd737/arxiv/__init__.py#L575. Is this a problem?

lukasschwab commented 10 months ago

@AllenWrong can you share a code snippet that reproduces the issue you're encountering? You shouldn't have to call self._results directly.

The integration tests are stable. These issues are most likely caused by temporary instability in the arXiv API service itself.

AllenWrong commented 10 months ago

@lukasschwab I solved this by changing query_url_format = "https://export.arxiv.org/api/query?{}" to query_url_format = "http://export.arxiv.org/api/query?{}". It is surprising.

lukasschwab commented 10 months ago

@AllenWrong yes, I observed HTTP/HTTPS behavior differences the last time this came up: https://github.com/lukasschwab/arxiv.py/issues/129

I think the arXiv folks might need to restart a server. I'll see if I can drop them a line.