libp2p / go-libp2p-pubsub

The PubSub implementation for go-libp2p
https://github.com/libp2p/specs/tree/master/pubsub
Other
322 stars 187 forks source link

Topic Management Strategies, Implementation, And Questions #253

Open bonedaddy opened 4 years ago

bonedaddy commented 4 years ago

Topic Management Strategies And Implementation

One of the recent changes to this library (which I personally like, as it has resulted in much better message throughput in benchmarks) makes it so that calling Join on a topic more than once results in an error, which initially makes having multiple goroutines access the same topic somewhat problematic because Join doesn't internally store the topic, and instead means the caller needs to implement some form of topic management, allowing you to share the same topic between multiple different goroutines.

Currently I've implemented what I call a "PubSubLocker" that allows me to pass around the topic, or subscription handler to different goroutines. I then have a shim I wrote around PubSub that intercepts Join calls, and routes them through the PubSubLocker, example:

// Join joins the topic and returns a Topic handle.
// Only one Topic handle should exist per topic, and Join will error if
// the Topic handle already exists.
func (psx *Pubsubx) Join(topic string, opts ...ps.TopicOpt) (*ps.Topic, error) {
    return psx.pslocker.GetOrSetTopic(topic, psx.pb, opts...)
}

It is doubtful I will be the only one to run into a situation like this, so it's probably best if there is some discussion around a solution. If my "PubSubLocker" seems like it is the solution I can open a PR here with it.

Questions

While digging through the codebase I noticed that the Subscription type doesn't have the same "gotcha" as Topic's do, in that you can only call Join once. As far as I can tell, it is perfectly valid for multiple different goroutines to call Subscribe on the same topic handler, however this will probably lead to wasted uses of resources, and I suspect lower throughput.

The one difference however is that there is no public Subscribe interface on the PubSub routers as there is with Topic. Currently my workaround is similar to how I handle topics in my previous PubSubLocker, but the issue is that if this pubsub router is passed to other services that don't have access to the PubSubLocker, they will be able to share a single Subscription interface.

aschmahmann commented 4 years ago

Thanks for bringing this up. I suspect that there may be other people who would also like some sort of management layer on top of pubsub to deal with Topics. Overall I'm not sure if there exists a one size fits all solution for Topic management so AFAIU the question is basically where should this code live and what's reasonable behavior to expect.

An example where topic management could be pretty rough is if PubSub was exposed via a libp2p daemon then figuring out global pubsub management across the various clients of the daemon could be difficult. However, there may be situations where creating a management layer within a given application is a great idea.

While digging through the codebase I noticed that the Subscription type doesn't have the same "gotcha" as Topic's do, in that you can only call Join once.

Yes, basically Topics may have extra metadata associated with them then just the topic ID (e.g. validators, max message size, topic priority level, etc.) and so having Topics be one instance per topicID was deemed useful (https://github.com/libp2p/go-libp2p-pubsub/issues/198). Subscriptions on the other hand are basically just event streams and so duplicating them is easy enough.

As far as I can tell, it is perfectly valid for multiple different goroutines to call Subscribe on the same topic handler, however this will probably lead to wasted uses of resources, and I suspect lower throughput.

Yes it will use more resources because it is duplicating the event stream, however this may be useful behavior. If a particular application would like to instead of a single Subscription where events are dispatched to multiple goroutines they can do so.

bonedaddy commented 4 years ago

No problem. You make a good point the solution to this will likely vary depending on your specific needs.

An example where topic management could be pretty rough is if PubSub was exposed via a libp2p daemon then figuring out global pubsub management across the various clients of the daemon could be difficult. However, there may be situations where creating a management layer within a given application is a great idea.

Well for this wouldn't it make sense to have a global pubsub system that prevents topics from being closed, and relying on internal server behavior to close out topics when no one is using them? That way any number of users may have access to a topic without worrying about one of them terminating the topic handler. Currently I'm taking a similar approach for my uses, where topics are only closed during server shutdown.

Yes it will use more resources because it is duplicating the event stream, however this may be useful behavior. If a particular application would like to instead of a single Subscription where events are dispatched to multiple goroutines they can do so.

That makes sense.

Perhaps the "simplest" solution might be to just reuse topic handlers internally, instead of erroring out on Join?