oberstet / scratchbox

Random stuff. Unpolished. Tests. Whatever.
27 stars 8 forks source link

Queue choice and others considerations #4

Closed jemattarde closed 1 year ago

jemattarde commented 1 year ago

What is the purpose of the following lines please in your instructive example of Producer-consumer in a context of asyncio.Protocol.data_received ?

if isinstance(res, asyncio.Future) or inspect.isgenerator(res):
                     res = yield from res

Did you have any specific reason to use collections.deque instead of the asyncio.queue?

Thanks a lot for your shared code

oberstet commented 1 year ago

this is quite old code, and above lines are for making it work regardless of what is returned (is it a future that needs to be yielded first?)

there is no reason for using asyncio.queue (I don't know if it was even there when I wrote the code .. my examples predate asyncio altogether) since the code doesn't need the multi-threading mutex guards that I assume are there in asyncio queue under the hood. I haven't looked much, as I don't need it. if you use a queue from multiple threads, depending on the queue implementation (if it isn't using single writer lockless and such stuff), any writes and reads need to coordinate

hope this helps!

jemattarde commented 1 year ago

Thanks for your time Your answer helps me to understand some pratices of asyncio.

By searching for alternative uses, I fell on this contribution https://stackoverflow.com/questions/35127520/asyncio-queue-consumer-coroutine/35132596#35132596 (other method and other kind of queue 😃 )

It's difficult for a stranger of Python architectures to deal with the limitations or the advantages of these different approaches for consuming received datagrams. Do you have an opinion about the way described in the above link?

Have a good day

oberstet commented 1 year ago

It's difficult for a stranger of Python architectures to deal with the limitations or the advantages of these different approaches for consuming received datagrams.

yes, this whole subject of asynchronous programming, applied to networking, applied to storage, etc .. and all that in the context of a language that has "history" and has evolved over time is complex and can be tricky.

personally, I'd approach this a bit from a different angle first: what do you want to achieve?

if it is

A) "good code structuring"

decouple then means: A puts stuff into queue Q, so that B can consume that when it wants (decoupled).

the other main reasons IMO are:

B) wanting to press maximum performance (network? storage? both? or more CPU?) out of the whole system

C) having to deal with threads (which has own traps, and also has additional restrictions in Python specifically ... the GIL) because you integrate with threaded code which you cannot modify

which of A, B or C is your motivation?

jemattarde commented 1 year ago

Clearly A.

However i dont see distinction in terms of goal between async functions and Threading that are for me two ways to obtain parallel loop tasks. Typically I need a Producer for both receiving and pushing datagrams into a queue and a Consumer for processing the enqueued elements in the order in which I have received. The producer loop execution is generally faster (reasonably) than an iteration of the Consumer.

Launching a parallel task which dequeues the data in a context of asyncio.Datagram_protocol is surprisingly not neither easy nor a widespread example in tutorials or docs of Python.

oberstet commented 1 year ago

However i dont see distinction in terms of goal between async functions and Threading that are for me two ways to obtain parallel loop tasks.

not sure, but there might be a fundamental misunderstanding: concurrent != parallel

2 tasks can run concurrently, but not parallel. this is the case when you only have 1 thread, and consequently, you can forget about synchronization worries (eg from a mere technical, shared memory perspective). in this case, queues are "just" one way to structure/decouple your code - logically. this is the primary focus of A.

2 tasks can also run parallel, but not concurrently - when they have to wait for each other for non-technical reasons (eg data dependencies). in this case, queues can be one way to achieve synchronization (but they are not the only way). a task that is not doing anything but waiting is still "runnable/running" in parallel with your other threads.

anyways, there are quite some aspects and details I recognize .. it is not a trivial thing ... architecture / code structuring questions mangle with synchronization / technical aspects ..