lukasschwab / arxiv.py

Python wrapper for the arXiv API
MIT License
1.07k stars 120 forks source link

Invalid entries in multi-member ID lists cause entry repetition #82

Open lukasschwab opened 3 years ago

lukasschwab commented 3 years ago

Description

A clear and concise description of what the bug is.

If id_list consists of a single nonexistent––but valid––ID, arXiv returns an empty feed which is interpreted to mean "no results."

If id_list consists of both existent and nonexistent valid IDs (["0000.0000", "1707.08567"]), the feed is non-empty––it contains a single item––but it has feed.feed.opensearch_totalresults == 2. The client takes this to be a partial page, and requests a page with offset 1... which lists paper 1707.08567 again. This is an API bug.

Notably, this behavior differs depending on the nonexistent ID. Nonexistent ID 1507.58567 yields an entry with missing fields (covered in #80, fixed by #82), whereas 1407.58567 yields no entries at all (covered here).

Example: https://export.arxiv.org/api/query?id_list=1407.58567,1707.08567

Steps to reproduce

Steps to reproduce the behavior; ideally, include a code snippet.

def test_invalid_id(self):
        results = list(arxiv.Search(id_list=["0000.0000"]).results())
        self.assertEqual(len(results), 0)
        results = list(arxiv.Search(id_list=["0000.0000", "1707.08567"]).results())
        print(len(results))
        self.assertEqual(len(results), 1) # Fails: 1707.08567 appears twice.

Expected behavior

A clear and concise description of what you expected to happen.

Results should not be duplicated.

Searching for ["0000.0000", "1707.08567"] should yield a single result.

Versions