jung-kurt / caddy-pubsub

A publish and subscribe plugin for the Caddy HTTP server
MIT License
7 stars 2 forks source link

Support for one user and many devices ? #1

Open ghost opened 5 years ago

ghost commented 5 years ago

Let's say I have a topic and many subscribers.

Each subscriber will poll at different times and so be working through the queue getting results from different point in the queue. I don't see that this is supported ?

Then a single user could be using many devices and you want to ensure that all devices are up to date in their subscriptions. So you need a device ID or something maybe. At the moment using Auth is maybe not enough ? So then each authID / device ID concactination can use a point in the queue per topic.

Looking forward to extending this. I like it's simplicity.

jung-kurt commented 5 years ago

Each subscriber will poll at different times and so be working through the queue getting results from different point in the queue. I don't see that this is supported ?

The default value for DeleteEventAfterFirstRetrieval is false and the default value for EventTimeToLiveSeconds is forever. This means that unless the event buffer for the specific category fills up (the default value for MaxEventBufferSize is 250), all subscribers will receive each event, even if they begin longpolling at different times. (And remember, longpolling is not like busy polling. If you have many subscribers, they each get a published event at the same time. They are each guaranteed to receive each published event.)

Then a single user could be using many devices and you want to ensure that all devices are up to date in their subscriptions. So you need a device ID or something maybe.

A device ID will not be needed. Each of a user's devices will be kept up-to-date. The concept of a user is only needed for authentication, in order to prevent too many subscribers from consuming resources on the server. The user name is not used at all by the golongpoll library. You could dispense with the user/password authentication in some cases like an internal network.

jung-kurt commented 5 years ago

Two additional clarifications.

  1. When the event buffer on the longpoll server fills up, the oldest event is discarded to make room for a new event. For a subscriber, all the events that have not been discarded are available.

  2. The example code begins a subscription by asking the server for events that occur after the current time. This means that the subscriber will not receive any of the currently queued events, only events published in the future. It would be easy to change this behavior.

ghost commented 5 years ago

hey @jung-kurt

Thank you for the prompt answers and clarifications. Sorry if i missed the info already in the docs.

I am trying to use OpenAPI and AsynAPI as the description language for microservices and so experimenting with a simple Subscription lib. The two teams split, but at least they use the same JSONSchema to describe everything... Anyway sort of relevant and useful background stuff.

OpenAPI https://github.com/getkin/kin-openapi

AsyncAPI https://github.com/asyncapi

--

About Point 1 When it fills up in Memory. Presume we need a durable backing too. Maybe Minio using S3Select. You can use Minio as a SQL Database so can make it easy to ask for the latest ordered by time with a max of X items when the Client polls in. https://github.com/minio/minio/tree/master/docs/select https://github.com/minio/minsql

In relation to Point 2: So in terms of Log Offset, if we use just time then wont clocks skews be a huge spanner here ? With PULL or PUSH systems, someone normally holds some state that defines their Log Offset ( like Kafka, NATS, etc). Its either the client or the server holding it.

jung-kurt commented 5 years ago

I am trying to use OpenAPI and AsynAPI as the description language for microservices and so experimenting with a simple Subscription lib.

Cool!

When it fills up in Memory. Presume we need a durable backing too

Keep in mind that once the subscribers are running, published events will not be lost unless they are published so frequently that the subscribers cannot keep up with them. But if that is the case, then even a backing store will likely not solve the problem of keeping all subscribers up-to-date.

I see two ways to address a backing store if really old events need to be retained. One would be to modify golongpoll (the engine behind the pubsub plugin) to handle this. The downside is that "priming" a subscriber with all these old events will require a subscriber to call the server for each of the old events. This would be automatic but quite inefficient.

The other way would be to write a simple server that performs two functions: (a) subscribes to events and puts them into a database, and (b) delivers all the old events in a JSON bundle on demand. This way a new subscriber could get all the old events at once and then begin a subscription with the pubsub server.

So in terms of Log Offset, if we use just time then wont clocks skews be a huge spanner here ?

Clock skew is not a problem. When a subscriber contacts the server, it asks for the oldest event that has occurred since the last event it received using the server's own event timestamp. Only the first call is subject to clock skew because it uses its own clock to set the "sinceTime" argument. To get around clock skew, you could set the "sinceTime" argument to last year.

ghost commented 5 years ago

I am trying to use OpenAPI and AsynAPI as the description language for microservices and so experimenting with a simple Subscription lib.

Cool!

When it fills up in Memory. Presume we need a durable backing too

Keep in mind that once the subscribers are running, published events will not be lost unless they are published so frequently that the subscribers cannot keep up with them. But if that is the case, then even a backing store will likely not solve the problem of keeping all subscribers up-to-date.

I see two ways to address a backing store if really old events need to be retained. One would be to modify golongpoll (the engine behind the pubsub plugin) to handle this. The downside is that "priming" a subscriber with all these old events will require a subscriber to call the server for each of the old events. This would be automatic but quite inefficient.

Yes, inefficient, but see below.

The other way would be to write a simple server that performs two functions: (a) subscribes to events and puts them into a database, and (b) delivers all the old events in a JSON bundle on demand. This way a new subscriber could get all the old events at once and then begin a subscription with the pubsub server.

Agree its much better. Its a query in reality on to the Freezer bucket store to catch up. Paginates through getting how ever many records the client asked for as batches. When its paging, once it gets to the sequence where the data in n hte LIve Store, it just sends back a new URL to use or flips over itself on the server.

THen how do you garbage collect stuff in the Freezer ? You know all subscribers. Held in some ETCD. So when the mapping of Users to a Topic turns to zero, delete the Freezer.

There are already many golang libs that do exactly what we are talking about semantically WAL LOGS, etc LiftBridge, NATS Streaming. Better to use one of these than reinvent i feel. They can easily have the golong polling running above.

https://www.infoq.com/news/2018/08/nats-liftbridge/

So in terms of Log Offset, if we use just time then wont clocks skews be a huge spanner here ?

Clock skew is not a problem. When a subscriber contacts the server, it asks for the oldest event that has occurred since the last event it received using the server's own event timestamp. Only the first call is subject to clock skew because it uses its own clock to set the "sinceTime" argument. To get around clock skew, you could set the "sinceTime" argument to last year.

jung-kurt commented 5 years ago

THen how do you garbage collect stuff in the Freezer ?

Good point.

You know all subscribers. Held in some ETCD. So when the mapping of Users to a Topic turns to zero, delete the Freezer.

This is indeed a case for modifying the golongpoll engine. But the change could be quite minimal. Maybe golongpoll could include a subscriber count in its event payload. This way, the independent freezer server could detect when it is the only subscriber and, when that happens, clear out the backlogged events from its database.

jung-kurt commented 5 years ago

This would be automatic but quite inefficient.

Yes, inefficient, but see below.

I just glanced at the code and was reminded that golongpoll does in fact send an array of all pertinent events rather than one at a time. So catching up on old events directly from golongpoll is efficient.

ghost commented 5 years ago

Hey again

I stumbled across this which could be a nicer approach. Uses SSE, which all browsers nativly support as well as any decent language

https://github.com/dunglas/mercure

I am also integrating NATS STreaming. NATS would send any updates to Caddy-pubsub and Mercury.

jung-kurt commented 5 years ago

I stumbled across this which could be a nicer approach.

This look very nice. Thanks for sharing.

Uses SSE, which all browsers nativly support

Except IE and Edge.