schollii / pypubsub

A Python publish-subcribe library (moved here from SourceForge.net where I had it for many years)
189 stars 29 forks source link

Content Filtering #21

Closed paddymccrudden closed 5 years ago

paddymccrudden commented 5 years ago

Does pypubsub support content or header filtering for clients? Topic filtering looks great, but filtering based on a specific kwarg or arg would be a useful addition.

schollii commented 5 years ago

Can you provide example showing how you would use this?

paddymccrudden commented 5 years ago

Suppose, for example, we have a general topic "BASE", whose messages are simply a pandas dataframe of values. Suppose we have two entities, say "A" and "B" who both emit such messages. Suppose we have subscribers "X" and "Y" who wish only to listen to messages from A, and B respectively.

A topic based solution would be to have X subscribe to the channel "BASE.A", and Y subscribe to "BASE.B". That works quite well. However, suppose you require that all topics are pre-defined using the following command.

pub.setTopicUnspecifiedFatal(False)

Then we need to register the topics "BASE.A" and "BASE.B". This is fine, but registering these entities at run-time can be a little unpleasant.

An alternative would be to have the message payload as something like (header, message), and with header being a dictionary {'source':'A'}. The entity X would then subscribe to topic "BASE" with content filter (source == 'A').

I have seen a few articles about this, but there is some basic information about this on wikipedia

schollii commented 5 years ago

Can you expand on "registering these entities at run-time can be a little unpleasant." because there might be a simpler solution that what you are doing now. Can you post some code?

Side note/Caution: you should really avoid adding the source of a message to the payload. A major point of publish-subscribe architectures is lack of knowledge of the end points. You will find that if you design your app to not care where messages come from or where they are going, it will be much more modular. Let the message be the contract, not one of the end points.

That being said, if you were just using "source" and Base.A/B as simple example but this is not what you are trying to do, then it seems that you can already do what you want in pypubsub:

pub.sendMessage(topic, header=dict(something: 'A'), item1=..., item2=...., )
pub.subscribe(listener, topic)
def listener(header, item1, item2):
    if header['something']!='A':
          return

A couple of things to bear in mind:

  1. having pubsub check for each listener of a topic, whether it has a filter, involves some performance overhead, so there should be a major win to offset that.
  2. the header is likely to be very application-specific, so it is probably not a good thing to filter on. Rather, the contents of the header should just be other message data keys (maybe you decide to name them all "header_..."), and the recipient can filter on those.
paddymccrudden commented 5 years ago

Thanks for your suggestions. I understand your logic about source, and that makes sense. I have taken your advice about source, and about content filtering, and am using it. Thanks!

schollii commented 5 years ago

Awesome!