Explore a means to stream property events between separate process

mikebaum commented 8 years ago

This issue tracks the initial work to explore how to stream property events between two separate JVMs running on the same host.

Initial thought is to explore using the clustered EventBus provided by Vert.x.

mikebaum commented 8 years ago

@dacianpitic @petikoch

Hey, in case you're interested, I've implemented a prototype that shows how to synchronize properties between separate JVMs. It uses the same SwingSynchronizedTextFieldApp, as was created previously. I have not merged the code into master yet, since it is just a proof of concept. Anyway, the branch it's on is called streamEventsUsingVert.x. To test it out run multiple instances of the SynchronizedTextFieldVerticle and then start typing text in any of them. Doing so should synch the text fields in the other running instances automatically.

mikebaum commented 8 years ago

@petikoch Hey Peti,

I've started to look into using Hazelcast, rather than Vert.x to achieve cross process property synchronization. Reason being is that Hazelcast although being much leaner and less feature rich (1 tiny jar versus several jars for Vert.x) seems to offer the core features that I need. While searching the internet for information about Hazelcast, I came across your post in stackoverflow. That made me chuckle... It's really a small world. I wonder, could you comment on your experience using that library? Would you use it again? What in your perspective where it's main benefits/drawbacks?

Cheers from Canada... Mike

@dacianpitic - FYI

Petikoch commented 8 years ago

Hi @mikebaum,

we use Hazelcast 2.x since more than 3 years in production and I can recommend it highly for certain use cases. We are switching to version 3.x now. We usually have around 50 nodes in the grid: around 5 "clients" (light member), 2 "servers" (full member) and around 40 "agents" (light member).

We use mainly these basic features of Hazelcast:

see the topology of our distributed system
direct communication between "nodes" via the distributed IExecutorService
nosql data store

Benefits:

one jar, apache licence
simple JDK-like programming model
good quality of software and documentation

Drawbacks:

size of the grid is somehow limited. We work with around 50 nodes in the grid, there are examples with 200 nodes. Cluster-Join-Time gets slower and slower with more nodes...
be aware of classpath issues regarding serialized objects (if you store serialized objects, all "full members" must have the necessary jars and versions in their classpath)
missing "reactive" API (but easy to add, using e.g. RxJava)

Let me know, if you need more details.

Best regards from Switzerland! Peti

mikebaum commented 8 years ago

@Petikoch Thanks for the speedy reply, just a few questions...

We usually have around 50 nodes in the grid: around 5 "clients" (light member), 2 "servers" (full member) and around 40 "agents" (light member)

Just curious, what is the difference between an agent and a client as you mentioned? I'm guessing the clients are users and agents are light members that are used more for parallel processing than for data storage.

direct communication between "nodes" via the distributed IExecutorService

I presume this is for processing, not service calls? What I mean is, have you used Hazelcast as an application server or would you recommend pairing it with Tomcat or some other app server? From your answer in StackOverflow it appears you have used Hazelcast as application server, though I'm not 100% sure I understood your answer regarding the use of a multimap.I see from the Hazelcast Docs there is an ability to add User Defined Services. Seems a bit more complicated than I would expect though.

size of the grid is somehow limited. We work with around 50 nodes in the grid, there are examples with 200 nodes. Cluster-Join-Time gets slower and slower with more nodes...

Does this total include the light member "clients" as well? If that's the case, that does sound a bit limiting.

Petikoch commented 8 years ago

@mikebaum, sorry for the confusing about clients, agents and servers.

In Hazelcast, there are

full members
lite members
clients

In "our" distributed application, we have

"our application" clients (hazelcast lite members, java swing clients)
"our application" agents (hazelcast lite members, kind of "bots")
"our application" servers (hazelcast full members, for data storage and some kind of "backend" services)

Regarding the distributed IExecutorService. We only use hazelcast in our small distributed application (around 50 nodes) and use the distributed IExecutorService as kind of RPC-mechanism (service calls). We built a little layer around the distributed IExecutorService, which allows to very easily implement server side services and resilient, load-balanced generic service clients (using dynamic proxies). We didn't publish it, so I can't share sample code at the moment, unfortunately. We did this implementation 3 years ago for Hazelcast 2.x, when there were no Hazelcast "user defined services" available. I didn't check the "user defined services" in detail, but I think it's probably "too low level" for simple custom services and too much work to implement. Maybe one could add some convenience layer above it?

About pairing with a webserver. We don't have an additional web-server "in front". If you do that, you have something pretty similar to Vert.x and can easily cover web-clients, as well. And of course thousands of web clients, then...

About the number of hazelcast cluster nodes... I don't have experience with the pure hazelcast client. I assume one can have "lots of" hazelcast clients (a couple of hundred? a couple of thousands?) and few cluster nodes (e.g. 10). It wouldn't be difficult to investigate, since you can startup dozens of hazelcast nodes (or clients) in one JVM, with e.g. a simple for-loop. I assume "cluster connection time" would be an issue, this takes in our environment between 5 to 20 seconds for a single node to join a 50 node cluster. This is not very convenient, but not an issue for us at the moment.

Best regards from Switzerland! Peti

Petikoch commented 8 years ago

@mikebaum , let's talk a little bit about

Explore a means to stream property events between separate process

If you answer the following questions, I'll think about it and can then probably give you some advice.

So, my questions are

Are the separate processes JVM's? Or Webbrowser's and JVM's? Or ... and ...?
How many processes are involved?

Best regards, Peti

mikebaum commented 8 years ago

@Petikoch Thanks for taking time to explain about using Hazelcast as an application server. I realize that's not part of this issue, but I find it tremendously interesting non the less. I know how it is to not be able to share details, since I have the same issue with my MVVM like framework I've already committed at work.

Anyway, back to this issue... to answer your questions:

Are the separate processes JVM's? Or Webbrowser's and JVM's? Or ... and ...?

For the property streaming of this issue, yes to both streaming between JVMs and between a JVM and the browser. However the most important would be JVM <---> JVM.

How many processes are involved?

I would imagine it would be as many processes as there are connected clients, plus the number of servers that need to know about the property. So this would depend on how many clients the user of the property api would have.

To give further details, eventually I'd like to create some kind of model object that could be kept in sync across JVMs and possible between a JVM and a browser.

It seems that Hazelcast would fit this problem nicely, as they even offer a REST client, which I suppose could be used by the browser. I guess the browser may need to poll, unless there is a means to add a listener from the browser as there would be between JVMs.

Petikoch commented 8 years ago

@mikebaum , the hazelcast REST client seems to be rather simple... just to access maps or queues. You would have to implement your own "communication protocol" on top of it. Or add an additional webserver to the stack with e.g. e nicer custom-built RESTful API. For the "webbrowser use case", vert.x would fit more naturally, since it offers a fine communication mechanism across JVM's and webbrowsers. The issue with vert.x is in IMHO the "message delivery warranty": it's just "best effort" and you need to implement everything else by yourself ("at least once delivery", ...).

An next alternative for JVM to JVM communication would be akka. It offers the actor abstraction across a distributed system with chooseable message delivery options and powerful resilient mechanisms. Paired with akka-http, you would have the possibility to do the "webbrowser <-> JVM" use case.

Lot's of options... as always ;-)

Best regards, Peti

mikebaum commented 8 years ago

@Petikoch Yeah, the REST client would not be complete enough as I guess you couldn't get bidirectional communication using that solution.

It would seem that using WebSockets between the Server (Java) and the browser might be a reasonable solution. I may search for a lightweight WebSocket server to support browsers and add that as a dependency. It would seem easy enough to connect Hazelcast to a WebSocket server which could act as the bridge to browser based clients.

In the end I think I'll need to create a set of interfaces that abstract the connection between client and server, that way I can swap out the messaging library (or framework) as needed.

As far as Akka goes, that is obviously an option as well, but I'm not sure I want to tie myself to a framework. The appeal as you stated earlier is that Hazelcast is 1 jar and simple.

You mention that Vert.x states it's "message delivery warranty" is "best effort". I have read this as well. How does Hazelcast compare? Is it more dependable?

Also, today while researching FRP glitches I came across these two interesting papers. I haven't finished them both, but they seem to describe exactly what I have in my head about this library. Here's the links in case your bored some day :) ...

Petikoch commented 8 years ago

@mikebaum, thanks for the links to the papers, sounds interesting! I'll check them out... :-)

Regarding delivery warranty in Hazelcast. While in Vert.x everything is asynchronous ans non-blocking, a lot of stuff in Hazelcast is asynchronous, but blocking.

Consider the distributed ExecutorService. After submitting a task, you'll get back a Future and will then typically call the blocking get(). Either it works or you get an exception. Nothing get's lost. This is what I call the classic JDK-upto7-programming-model with it's issues like lack-of-composability, waste-of-resources and try-catch-orgies.

Regarding the data structures in Hazelcast: you're free to select the level of consistency (strong, eventually, number of backup copies, ...).

Regarding the communication mechanisms in Hazelast like Topic (or Reliable Topic): You can choose also here the "reliability". While Topic is "fire and forget", Reliable Topic uses synchronous backup copies to prevent lost of events.

Conclusion: Hazelcast offers a choosable level of consistency and message delivery warranty. Nice.

Petikoch commented 8 years ago

@mikebaum, another idea... If you don't mind an additional "server process".

You could use Apache Kafka. Install and run "somewhere" one or more kafka servers, and then communicate between your JVM's using the kafka java client (about 2 jars). From your web browers use a kafka web client library like kafka-websocket.

Pro:

highly scalable
reliable

Con:

separate "server processes" somewhere

mikebaum commented 8 years ago

@Petikoch

Consider the distributed ExecutorService. After submitting a task, you'll get back a Future and will then typically call the blocking get(). Either it works or you get an exception. Nothing get's lost. This is what I call the classic JDK-upto7-programming-model with it's issues like lack-of-composability, waste-of-resources and try-catch-orgies.

Ha, ha try-catch-origies, I nearly choked on my food, that was so funny... You're right though, it could be made async, but at the expense of another thread, is that a big deal though, I'm not sure. Could use one "Actor" like entity in the client that would broker between the desktop application and the server, that would at least contain the orgy :). This is obviously something that you get out of the box with Vert.x, but I'm pretty sure I could write something minimalistic to achieve that.

Regarding the communication mechanisms in Hazelast like Topic (or Reliable Topic): You can choose also here the "reliability". While Topic is "fire and forget", Reliable Topic uses synchronous backup copies to prevent lost of events.

Nice, reliable topic, I will look into that.

Kafka seems like a decent option also, but from my quick impression it seems more complicated. Someone else also suggested that to me as well.

Anyway, for now I'm going to give it a try with Hazelcast.

Thanks again for your advice :+1:

mikebaum commented 8 years ago

Starting to look into this now...

My initial thoughts are that I will need to add a PropertyId class which wraps around a UUID that must be globally unique. The PropertyId could be typed, which will help to provide compile time type safety. So an initial design for a PropertyId class would be:

public class PropertyId<T> {
...
    public String getUuid();
...
    public static <T> PropertyId<T> create(String uuid);
...
}

Ideally PropertyId instances would be created statically and during creation a consistency check could be made to verify that there is no collision on the UUIDs. Or perhaps the Ids could be defined in some properties file(s), that can be used to bootstrap all the known properties.

mikebaum commented 8 years ago

In addition to property ids, there should be an interface that abstracts the communication layer. Although I intend on using Hazelcast for IO, I want to make the library flexible enough to use another technology. Therefore there should be an interface as follows:

interface PropertyBus {
    <T> void publish(PropertyId id, T value);
    <T> PropertyStream<T> listen(PropertyId id);
}

This interface may need to be renamed and have additional methods added as more of the features of Hazelcast are used.

mikebaum commented 8 years ago

Having worked on the issue a little bit, I quickly discovered that for now there should not be a PropertyBus but rather a PropertyService with the following api:

    <T> Property<T> getProperty(PropertyId<T> id);

    <T> PropertyStream<T> getPropertyStream(PropertyId<T> id);

For now the methods return a Property and a PropertyStream, however I think that will need to change soon, since there needs to be a Remote version of those classes. The remote version of those classes will need to work differently since we do not want to restrict the method calls to the UI EventLoops, in fact it needs to be reversed. Also perhaps the remote property class only needs to offer a getter, setter and a means to attach a listener.

mikebaum commented 8 years ago

I have created a separate issue (#52) to track the work of creating a RemoteProperty using hazelcast.

mikebaum / RxUI

Explore a means to stream property events between separate process #17