marcodisarno commented 11 months ago

Hello dear folks at socotecio,

I'm a colleague of Mark Doerr, who wants to use your library in his prize-winning LARASuite.

I'm currently developing a tool that talks to LARA via GRPC and needs to be informed about changes to certain models. Mark informed me you're working together on this example and I think my implementation might be a good addition here. There are even some good arguments to add a Subscribe feature to your library as well. I invite you to use this as a cheet sheet, I've put some comments here and there where I think I see how this would be done.

So, cut to the chase, what does this contain?

Code overview

`subscriptions.py`

Contains generic service mixins for Sync and Async, addding a Subscribe() call to the model services.

These subscribe themselves to the respective certain builtin django model signal. "post_init" (read), "post_save" (write) and "post_delete" (well, delete) are supported thus far.

Since the handlers are called synchronously by django and also don't use asyncio, we have a ModelSignalDispatcher per signal per model, living inside the AppHandlerRegistry.

These dispatchers hold N queues for each of the N GRPC clients that called Subscribe(), decoupling the django app and the clients from each other. This goes at the cost of memory, but is necessary without async receivers (and there, really, is the same).

There's also a built-in heartbeat mechanism for proper receiver bookeeping, see ModelSignalDispatcher.get_signal_watcher and ModelSignalDispatcher.wait. Note that this introduces a heartbeat interval (ModelSignalDispatcher.interval_seconds), which sets a maximum frequency for a batch of events. But could also be done with mutexes or semaphores.

`handlers.py`

Swap out the AppHandlerRegistry with our catchily-named AppHandlerRegistryWithModelSignalDispatcher

`services.py`

Inherit from ModelServiceWithSubscribe or AsyncModelServiceWithSubscribe from subscriptions.py for all models you want to be subscribable.

`serializers.py`

Here we need a forward-import for dynamic serializer resolution. This is a hack for you guys to solve better :)

# forward import for dynamic serializer resolution
from .subscriptions import SubscriptionProtoSerializer

Remarks

async django signal receivers

Django 4.2 does not have the async handlers yet, the freshly released Django 5.0 does.

Model Signal Dispatcher storage

You might want to put this someplace else, but this ensures easy access from the services and fulfills the requirement of a singleton (of sorts).

final note

I think people will find this very useful, since it closes the gap for Django&GRPC, which for example with GraphQL is also done by the community, see here: https://docs.graphene-python.org/projects/django/en/latest/subscriptions/

marcodisarno commented 11 months ago

By the way, I've compiled the proto into python only, did not add it to the frontend or api, but have a jupyter notebook to illustrate how it works.

you might have to run this locally though (I dont know which proto_pb2 files get imported to the docker)

AMontagu commented 10 months ago

Hello @marcodisarno

Thank you. Idealy I will have a look to it next week (we are finalising the new revisited/completed documentation).

Some points to consider:

gRPC is not the best tools for async subscriptions. We used it in the past and had issues with streams just not working anymore without any errors
For this kind of use cases Kafka or Pulsar are the pure player and are way more scalable/resiliant.
By opening a stream you keep a connection open and it is impossible to scale your servers to more than one instance as you don't know if the producer and the consumer will connect to the same server. (this is possible by using a shared DB like REDIS but way more complicated to handle)

If you still interested on that we have an old abandonned projet of an event store in gRPC that I can share but we droped it as it is way too much work to scale and maintain.

Considering this, after reading your implementation and talking about this with @legau, we may consider to not integrate this in the example as beginner will use it and may face issue that will make them regret the choice of using DSG.

markdoerr commented 10 months ago

Thank you, @AMontagu, for your insights into your valuable experiences with grpc based async subscriptions - very interesting and good to know.

Since our LARAsuite will have rabbitMQ as micro service message queuing services, we should consider also using it for this purpose. I selected rabbitMQ over Kafka and Pulsar, because of its simplicity: https://dattell.com/data-architecture-blog/rabbitmq-vs-pulsar/

If we will need more performance, we could easily change, I guess.

What do you think ? @marcodisarno and @AMontagu

marcodisarno commented 10 months ago

Hey @AMontagu,

I'm lacking experience with the reliability of gRPC streaming, so thanks for that insight, we'll need to consider this. The scalability concerns are somewhat unclear to me though, as I thought that after opening a gRPC channel, producer and consumer were fixed. And you wouldn't want to shutdown a grpc server that has an open channel, but probably you are seeing use cases here I don't.

Thanks for looking into it though!

@markdoerr we can discuss using a messaging service for this in general, yes, but this is outside the scope of socio-django.

AMontagu commented 10 months ago

Hi @marcodisarno,

There is great article on internet about that. It only concern long keepalive stream. The issues in this case are similar to websocket: https://tsh.io/blog/how-to-scale-websocket/

Not that there is no issue for stream between only one client and one server for a particular actions. The issues come from multiple producer with multiple consumer on multiple server OR with connection alive even if no message sent into the connection for a long time.

socotecio / django-socio-grpc-example

Add subscriptions via server stream #12