h5law commented 8 months ago

Objective

Implement the ability for two observables to be merged into a single observable that can supply events from two+ sources to a single observer.

Origin Document

239 realised the need for the ability to subscribe to multiple sources as a necessary refactor for an overly verbose `DelegationClient` subscription strategy

    // TODO_HACK(@h5law): Instead of listening to all events and doing a verbose
    // filter, we should subscribe to both MsgDelegateToGateway and MsgUndelegateFromGateway
    // messages directly, and filter those for the EventRedelegation event types.
    // This would save the delegation client from listening to a lot of unnecessary
    // events, that it filters out.
    // NB: This is not currently possible because the observer pattern does not
    // support multiplexing multiple observables into a single observable, that
    // can supply the EventsReplayClient with both the MsgDelegateToGateway and
    // MsgUndelegateFromGateway events.

Goals

Allow multiple observables to be merged into a single observable that can supply notifications from multiple sources
Refactor the EventsQueryClient to support merged observables

Deliverables

[ ] Implement observable multiplexing
[ ] Refactor the EventsQueryClient to allow for multiple query subscription strings
[ ] Enable the EventsReplayClient to receive events from multiple sources
[ ] Update the events package documentation

Non-goals / Non-deliverables

Change the existing observable logic
- implement new types
Break the existing EventsQueryClient logic
- supporting multiple query strings should not break the existing usages

General deliverables

[x] Comments: Add/update TODOs and comments alongside the source code so it is easier to follow.
[x] Testing: Add new tests (unit and/or E2E) to the test suite.
[ ] Makefile: Add new targets to the Makefile to make the new functionality easier to use.
[x] Documentation: Update architectural or development READMEs; use mermaid diagrams where appropriate.

Creator: @h5law Co-Owners: @bryanchriswhite

bryanchriswhite commented 8 months ago

"Merge" is what this operator is usually called. See the reactivex docs about it.

Olshansk commented 8 months ago

@bryanchriswhite When we started the observable package, we discussed how it resembles a pubsub library with fan-out and fan-in mechanisms, but we needed something specific.

The growing complexity of the observable package is making me think that it's looking more and more like a pubsub (i.e. multiple sources publish to a shared channel and you subscribe on specific events).

I wanted to ask you if you're confident that we're still on the right path implementing this ourselves rather than swapping it out for another solution (e.g. nats).

h5law commented 8 months ago

@Olshansk Id say its more like we are subscribing to differernt events and publishing them to a single notifee. I know this makes little difference but I'm in favour of doing a little research spike before implementing this with @bryanchriswhite as to whether we have outgrown our original ideas for this repo.

bryanchriswhite commented 8 months ago

@bryanchriswhite When we started the observable package, we discussed how it resembles a pubsub library with fan-out and fan-in mechanisms, but we needed something specific.

The growing complexity of the observable package is making me think that it's looking more and more like a pubsub (i.e. multiple sources publish to a shared channel and you subscribe on specific events).

@Olshansk I firmly disagree with this characterization. I think the similarity you're describing is surface-level only. There is no subscribing to specific events; rather, a consumer subscribes to an entire observable for the purpose of handling asynchronous data streams. Observables are not intrinsically related to message-oriented-middleware, which is what I believe you're describing (and NATS is an example of).

I wanted to ask you if you're confident that we're still on the right path implementing this ourselves rather than swapping it out for another solution (e.g. nats).

I am still very confident that this is the optimal path with respect to readability, maintainability, and extensibility. While observables are fairly abstract, their power simplifies the implementation of other high-level objects; thus encapsulating the complexity of handling asynchronous data streams within observables and operators. Additionally, the observer pattern is well understood and wide-spread across many programming language ecosystems (reactivex as one example specification with many language implementations). Nothing we build in this library should surprise anyone who is familiar with these concepts.

I asked chatGPT for help in articulating the differences between observables and pubsub and these are the bits that seemed especially relevant to this discussion:

Conceptual Basis:

Observable Implementation: This is typically a fundamental part of reactive programming. Observables represent a stream of data or events to which observers can subscribe. The observable emits notifications of changes, and subscribers react to these changes. It’s a pattern that focuses on asynchronous data streams and propagation of change.

Pub-Sub Libraries (NATS, Gossipsub): These are messaging systems designed for distributed systems communication. They follow the publish-subscribe pattern, where publishers send messages without knowledge of subscribers, and subscribers receive messages without knowledge of publishers. This pattern is often used in event-driven architectures and for decoupling services in a microservices architecture.

Use Cases:

Observable Implementation: Ideal for handling real-time updates, UI events, data binding, and other scenarios where reacting to data/event streams is required.

Pub-Sub Libraries: Suited for building distributed systems, microservices architectures, and event-driven systems where decoupling of components and scalable communication is essential.

Scenarios Where Observables and Pub-Sub Libraries Might be Mutually Exclusive:

Internal Application State vs. Inter-Service Communication:

Observables: Best for managing internal application state, real-time updates within a single application, or handling UI events.

Pub-Sub Libraries: More suited for communication between different services or components in a distributed system.

Single-Process Applications:

In a scenario where the entire application logic resides within a single process, using a pub-sub library for internal event handling might be overkill. Observables alone would suffice for handling asynchronous events and data streams.

bryanchriswhite commented 8 months ago

I did another round of searching and these are the only options I could find (which seemed they could potentially be production-grade):

RxGo: The official reactivex implementation for go. It doesn't support generics / requires lots of type assertions. See the real world example from the readme.

While this library includes many features, including operator implementations, each operator which applies a function MUST type assert at the start of the function body. When @red-0ne and I were surveying the landscape previously, we decided this was a deal breaker and I stand by that decision.

As we've discussed before, an effort was started to bring generic support to RxGo but seems to have stalled.
go-observer: Actually looks pretty good at first glance; it does support generics (no type assertions needed / full compile-time typechecking). It doesn't implement any operators (replay, merge, each, etc.) so we would still need to bring our own. It's worth noting that it does not use go channels like our implementation does; this could have an impact on performance (good or bad :shrug:), benchmarks could be used to investigate further.

I would expect re-implementation of existing operators for this observable implementation to be fairly straight-forward, barring any mismatched concurrency expectations (would need to read more thoroughly).

I discovered this blog post discussing the differences between using an observer pattern vs. vanilla channel in go. The conclusion summarizes it nicely:

Both the classic Observer Pattern and Golang channels provide efficient ways to implement the Observer Pattern. The classic Observer Pattern using interfaces provides more flexibility and a clear separation between the subject and its observers, while using channels provides a lightweight and concurrent way to notify observers of changes in the subject. Ultimately, the choice between the two approaches depends on the specific requirements of the application, and the trade-offs between flexibility, ease of implementation, and concurrency.

In our case we get the benefit of both as we're implementing observer pattern interfaces using data structures that lightly wrap go channels.

Olshansk commented 8 months ago

@bryanchriswhite I really appreciate you continuing to push and providing all of the following:

Other options and their tradeoffs
Our reasoning of how/why we got here and why we're continuing
References to both blog posts & ChatGPT conversations.

My personal summary/takeaway is:

Observer > pubsub: We're building "just-in-time" reactive components in the SDK, not
Roll our own > other: Other options are either limited in their design/architecture (i.e. generics) or feature set (i.e. merging)

If anything, this gives me more confidence that (eventually) breaking out the Observer package into a separate library will be a real public good.

pokt-network / poktroll

[Off-chain] Implement `merge` observable operator #280

Objective

Origin Document

239 realised the need for the ability to subscribe to multiple sources as a necessary refactor for an overly verbose `DelegationClient` subscription strategy

Goals

Deliverables

Non-goals / Non-deliverables

General deliverables

pokt-network / poktroll

[Off-chain] Implement `merge` observable operator #280

Objective

Origin Document

239 realised the need for the ability to subscribe to multiple sources as a necessary refactor for an overly verbose DelegationClient subscription strategy

Goals

Deliverables

Non-goals / Non-deliverables

General deliverables

239 realised the need for the ability to subscribe to multiple sources as a necessary refactor for an overly verbose `DelegationClient` subscription strategy