lukasschwab / arxiv.py

Python wrapper for the arXiv API
MIT License
1.11k stars 123 forks source link

enable + in query string #13

Closed cnglen closed 6 years ago

cnglen commented 6 years ago

enable + in query="au:del_maestro+AND+ti:checkerboard"

lukasschwab commented 6 years ago

Should be able to test/merge tonight.

lukasschwab commented 6 years ago

I run into errors testing 22e24aa55cd626e246b650c14498f956a6db44de with the query string you provide; it seems the quote argument isn't supported by urlencode(). I also can't find mention of the parameter in documentation.

I'm running urllib version 1.23 in both Python 2 and Python 3.

$ pip freeze | grep urllib
urllib3==1.23

Python 2.7.15:

>>> import arxiv
>>> arxiv.query(search_query="au:del_maestro+AND+ti:checkerboard")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "arxiv/arxiv.py", line 31, in query
    "sortOrder": sort_order}, quote='+')
TypeError: urlencode() got an unexpected keyword argument 'quote'

Python 3.7.0:

>>> import arxiv
>>> arxiv.query(search_query="au:del_maestro+AND+ti:checkerboard")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/schwabl/Desktop/arxiv.py/arxiv/arxiv.py", line 31, in query
    "sortOrder": sort_order}, quote='+')
TypeError: urlencode() got an unexpected keyword argument 'quote'

So, some follow-up questions:

lukasschwab commented 6 years ago

More fundamentally, there's a question of how query strings should function.

The default urllib behavior, via quote_plus(), is to convert each instance of ` to+` in the query string. Your query is already possible in the existing library:

>>> import arxiv
>>> arxiv.query(search_query="au:del_maestro AND ti:checkerboard")

There are conceivably cases in which a user might want to first URL-encode and then modify their query string before passing it to arxiv.query(). This could be accommodated by defining a wrapper for urllib.parse.quote(s, safe="+"):

def query_with_plusses(string, safe="", encoding=None, errors=None):
    return quote(string, safe=safe + "+", encoding=None, errors=None)

My concern is that this would produce issues for those expecting the standard urllib behavior––for example, those who want to include an escaped + in a part of their query––e.g. if they're searching for a paper entitled Odds+Ends. This should not be misinterpreted as a URL-encoded space by the server, so the character should be quoted.

Additionally, Python2 urllib.urlencode() doesn't support the specification of an alternative to quote_plus(). It'd take significant redundant code to implement this behavior both for Python 2 and Python 3.

Because the functionality in question is already available in arxiv 0.2.3 by using spaces in the query string, and because a working enhancement is likely to interfere with expected query behavior and/or impact Python 2 support, I'm going to close the PR.

If there's a cleaner solution I'm missing––perhaps by refactoring the query() function so that the request and the URL-encoding are separate––feel free to push those changes. I'll gladly reopen and review! 😃

Thanks for your interest in arxiv! Let me know if I can clarify anything.

cnglen commented 6 years ago

maybe this is the best way(suggested by you): arxiv.query(search_query="au:del_maestro AND ti:checkerboard")

Thanks.