Closed zlatko-minev closed 2 years ago
Hi Zlatko––thanks for taking the time to open an issue! If I understand correctly, I think the issue here is underdocumented usage!
In the existing code, (Client)._format_url(...)
assumes the query string is unencoded. The example expression au:del_maestro+AND+ti:checkerboard
is already partially URL-encoded (plusses for spaces), so it gets double-encoded. The encoded :
s aren't the issue; it's encoding +
→%2B
instead of `→
+`.
Unencoded compound queries (with spaces rather than plusses) work:
>>> import arxiv
>>> c = arxiv.Client()
>>>
>>> # Pre-encoded query yields a double-encoded query URL.
>>> c._format_url(arxiv.Search(query="au:del_maestro+AND+ti:checkerboard"), 0, 100)
'http://export.arxiv.org/api/query?search_query=au%3Adel_maestro%2BAND%2Bti%3Acheckerboard&id_list=&sortBy=relevance&sortOrder=descending&start=0&max_results=100'
>>>
>>> # Unencoded queries yields the expected query URL.
>>> search = arxiv.Search(query="au:del_maestro AND ti:checkerboard")
>>> c._format_url(search, 0, 100)
'http://export.arxiv.org/api/query?search_query=au%3Adel_maestro+AND+ti%3Acheckerboard&id_list=&sortBy=relevance&sortOrder=descending&start=0&max_results=100'
>>> # Search results include the expected article.
>>> next(c.results(search))
arxiv.Result(entry_id='http://arxiv.org/abs/cond-mat/0603029v1', updated=datetime.datetime(2006, 3, 2, 2, 22, 45, tzinfo=datetime.timezone.utc), published=datetime.datetime(2006, 3, 2, 2, 22, 45, tzinfo=datetime.timezone.utc), title='From stripe to checkerboard order on the square lattice in the presence of quenched disorder', authors=[arxiv.Result.Author('Adrian Del Maestro'), arxiv.Result.Author('Bernd Rosenow'), arxiv.Result.Author('Subir Sachdev')], summary='We discuss the effects of quenched disorder on a model of charge density wave\n(CDW) ordering on the square lattice. Our model may be applicable to the\ncuprate superconductors, where a random electrostatic potential exists in the\nCuO2 planes as a result of the presence of charged dopants. We argue that the\npresence of a random potential can affect the unidirectionality of the CDW\norder, characterized by an Ising order parameter. Coupling to a unidirectional\nCDW, the random potential can lead to the formation of domains with 90 degree\nrelative orientation, thus tending to restore the rotational symmetry of the\nunderlying lattice. We find that the correlation length of the Ising order can\nbe significantly larger than the CDW correlation length. For a checkerboard CDW\non the other hand, disorder generates spatial anisotropies on short length\nscales and thus some degree of unidirectionality. We quantify these disorder\neffects and suggest new techniques for analyzing the local density of states\n(LDOS) data measured in scanning tunneling microscopy experiments.', comment='10 pages, 11 figures; added reference', journal_ref='Phys. Rev. B 74, 024520 (2006)', doi='10.1103/PhysRevB.74.024520', primary_category='cond-mat.str-el', categories=['cond-mat.str-el', 'cond-mat.supr-con'], links=[arxiv.Result.Link('http://dx.doi.org/10.1103/PhysRevB.74.024520', title='doi', rel='related', content_type=None), arxiv.Result.Link('http://arxiv.org/abs/cond-mat/0603029v1', title=None, rel='alternate', content_type=None), arxiv.Result.Link('http://arxiv.org/pdf/cond-mat/0603029v1', title='pdf', rel='related', content_type=None)])
I'll leave this issue open and push some improved documentation.
Updated docs are live: http://lukasschwab.me/arxiv.py/index.html#Search.query
Motivation
Need to do advanced query for arxiv such as
?search_query=au:del_maestro+AND+ti:checkerboard
The problem is that
urlencode
encodes certain key characters such as colon. @IceKhan13This is so we can use compound queries and
Solution
Quick and dirty patch solution. WARNING: Not backward compatible
Considered alternatives
Additional context