Closed BalconyJH closed 5 months ago
Hi @BalconyJH — two questions!
asyncio.to_thread
as described here?requests
Dependency with niquests
and Introduce Asynchronous FunctionalityDescription:
Completely replace the requests
library with niquests
in the project, removing the dependency on requests
. niquests
is well compatible with requests
.
Changes:
source code in https://github.com/lukasschwab/arxiv.py/blob/8ef0759f870b563a2ef9fff3cf5b13ef437dd737/arxiv/__init__.py#L638C5-L676C1
Replace requests
with niquests
: According to the niquests blog, this should be a straightforward replacement. Tests in my local environment show that niquests
behaves consistently with requests
, with all tests passing after necessary adjustments.
Update Class Initialization:
We need to modify class initialization to use niquests
. This involves updating import statements and any requests
-specific initialization code.
Add New Asynchronous Function:
A new asynchronous function will be added to leverage niquests
's asynchronous capabilities. This function will provide additional flexibility in handling HTTP requests.
Images:
niquests
:Utilizing asyncio.to_thread
or ThreadPoolExecutor
provides a feasible method to execute synchronous requests in an asynchronous context.Nevertheless, this approach introduces a level of complexity by blending asynchronous and synchronous paradigms, coupled with the intricacies of thread management. Such complexity could potentially lead to challenges in maintaining and debugging the code. Should this strategy prove to be cumbersome in its implementation, I would consider it as a fallback option.
I eagerly await your thoughts on this approach, particularly any insights into potential risks or nuances that might have been overlooked.
@lukasschwab
Looks like you've gone to the trouble of writing code already. Feel free to open a PR and we can discuss there, with the help of tests/linter. No need to worry about e.g. documentation at this point.
In broad strokes, what you're suggesting here makes sense. Here are some notes.
Update Class Initialization: We need to modify class initialization to use niquests. This involves updating import statements and any requests-specific initialization code.
This is my concern about breaking changes: if arxiv.Client.results
goes from being a method you can call synchronously to one that must be await
ed, that'll require a major-version change to this API. That's not a non-starter, but I'd want to see more vocal support for the change from the rest of the community.
An alternative interpretation: we may be able to expose a separate async client (from arxiv.async import Client
?) with async
methods. In this case, we should be careful not to needlessly duplicate logic; factoring common helpers out of the synchronous Client
is important.
Completely replace the requests library with niquests in the project, removing the dependency on requests. niquests is well compatible with requests.
Curious — why niquests
? aiohttp
and httpx
appear to be more widely-adopted by orders of magnitude (just taking the first two examples I found).
Is there some specific reason to prefer niquests
I'm missing?
New Asynchronous Function:
One note: we'd need to convert since_last_request
/_last_request_at
to something threadsafe to prevent races between separate but concurrent calls to _async_try_parse_feed
.
I apologize for my hastiness. I have only modified a portion of the functions because I believe presenting the code might be a more convenient way to communicate.
An alternative interpretation: we may be able to expose a separate async client (?) with methods. In this case, we should be careful not to needlessly duplicate logic; factoring common helpers out of the synchronous is important.
from arxiv.async import Client``async``Client
I believe that creating a separate AsyncClient class would be a better choice. Although it will bring the foreseeable task of maintaining a new class and a similar set of parsing logic(Perhaps it would be straightforward to take feedparser.parse
out of the Client
class and use it as a separate helper function.), this approach might be more suitable. Additionally, considering that the original download depends on urlretrieve, this would be a good opportunity to incorporate parallel downloading capabilities as well.
Curious — why niquests? aiohttp and httpx appear to be more widely-adopted by orders of magnitude (just taking the first two examples I found).Is there some specific reason to prefer niquests I'm missing?
In fact, using httpx
or niquests
here doesn't make much difference in terms of functionality; they both support synchronous and asynchronous calls. If there's anything to be noted, it's perhaps as niquests
itself claims: they are the "fastest" in synchronous operations and support HTTP/3. I don't have a preference for choosing between the two.
Could we consider discontinuing support for Python 3.7? The latest version of httpx now requires Python 3.8 or higher.
Motivation
Solution
Considered alternatives
Additional context