Do we want to split the mediation functions from the core OpenHIM functionality?

rcrichton commented 11 years ago

Currently, all of the RHEA mediations are embedded directly in the OpenHIM. When considering other use cases we may not want to make use of the RHEA mediation. Should we split these out into a a separate project that the 'core' can call?

Essentially, the OpenHIM will then be made up of 2 parts:

The core OpenHIM - does logging, persistence of messages and security
A number of mediators that the core OpenHIM can be configured to send messages to if needed

rcrichton commented 11 years ago

This is sounding like a good idea due to other projects wanting to use the OpenHIM software that are not RHIE. The Mediators that are in the OpenHIM at the moment are specific to RHEA so we need to provide a mechanism for additional mediators to be added and to remove the ones that are not needed.

Splitting out a generic core from the implementation specific mediators will allow us to achieve this.

An example of how the RHIE architecture would look under a new split design can be seen here: https://docs.google.com/a/jembi.org/drawings/d/13v2Kfnpw0vFihUe73-aqqf4Il7sjdmMKexOFH6Op1mY/edit

carlsbox commented 11 years ago

I think the conceptual design is a good approach. What I would vote for when it comes to implementation of this is a "simple" method of standing up the HIM-CORE (S.A.L.E) and adding new mediations relatively dynamically in a manner that allows scaling but doesn't force us to have a server for each mediation flow.

rcrichton commented 11 years ago

The mediators could still live on the same physical server but as independent mule applications. The only restriction would be the we would have to use a different port for each mediator service. This gives us the greatest flexibility for scaling them out to other servers as needed.

I have started an initial design doc of how we could split the OpenHIM to achieve this. Here is the Google doc: https://docs.google.com/document/d/1fJ2cKlnBrOg4wWmLc1jHmYDeqUkLbkdNCD_gquLnnL8/edit?usp=sharing

@devcritter I'd like to get you thoughts on this. If you have some time to spare could you have a look through this.

carlsbox commented 11 years ago

Sounds good. I'm assuming when we come to the technical implementation of this we will figure out how to restrict access to those ports to only authorised users :)

rcrichton commented 11 years ago

Yup, we could just setup the machine to only accept connections to those ports from localhost (ie. itself)

hnnesv commented 11 years ago

@rcrichton Sure, I'll add my thoughts to the google doc

hnnesv commented 11 years ago

Are there any reasons as to why we're not considering using messaging queues for communicating with the mediators? (as from previous discussions)

A messaging queue seems like a more fitting solution to me than http ports:

They're designed especially for this type of task
It will be much easier to restrict access to the MQ than http ports (we just need to block the MQ's ports in the firewall). But we'll still have the flexibilty to move mediators to other servers (the MQ just needs to be blocked outside the data center)
Implicit scalability: we can easily add multiple instances of a mediator, since they'll just be multiple workers

(I'll try add expand on this topic in the google doc)

rcrichton commented 11 years ago

Thanks for the comments Hannes. I've been thinking about this a lot and what I have been thinking is that message queues work really well in a asynchronous workflow. However in our case everything that we are doing for the RHIE is synchronous. I'm not convinced that message queues work as well under this case. We don't really need a queue, messages shouldn't be waiting for their turn to be processed, they need to be processed and returned straight away. If messages are spending times in queues that mean we aren't processing messages fast enough and should actually increase the efficiency of our mediators. We could scale them out horizontally with HTTP using DNS round robin or HTTP load balancers if we are just using REST services. I'm not sure we need an external message queue to get the functionality that we need. If we had asynchronous workflows I think a messaging queue would make sense.

I think restricting access will be just as easy as with a messaging queue, we can still just restrict ports for HTTP (we don't have to just use port 80).

Also, with messaging queues such ass ActiveMQ you need to use a specific connector library to read and write to queues. This makes it (slightly) more difficult to implement the mediators, especially if we can imagine that these mediators could be built using any technology that makes sense (they are decoupled). HTTP allows these mediators to be easily reused by any other components without them having to know anything about a messaging queue. This could come in handy in the future.

These are just the thoughts that have been running around in my head, let me know what you think, I'm happy to discuss this openly.

hnnesv commented 11 years ago

Hi Ryan. Sure I'd love to discuss this at some point. Architecture's really fun :)

I do think that certain async processes would benefit us, e.g. during a save encounter we can asynchronously send off the ecid, epid, facility and terminology requests at the same time. The way I see it the network's always going to be our slowest points during a transaction flow, so we should get a huge performance increase by parallelising these request.

Another good benefit to me of an MQ is that it's easy to implement a pub/sub model. This would be a awesome in terms of HIM modularity. For example auditing service can easily just subscribe to "all" and then run seamlessly without any of the mediators needing to do any work to incorporate auditing. Pub/sub would also be fantastic for CDS.

Of course any of the above benefits can be implemented using http sockets (i.e. threading for parallel requests (... with proper threading pools of course)), I just think it would just be easier using MQs.

I drew up a diagram in the google doc just to explore how I think this would work (I just hope I explained it well in there!).

Of course I'm not attached to either technology; we just need to explore all the options and see which would fit best (IPC is of course a pretty common problem and a big area).

rcrichton commented 11 years ago

Some quick comments, but I plan to think on this some more:

I do agree, parallelising the mediation step would be useful. I'm not saying we shouldn't do that (actually I definitely think we should do that).
I like the idea of a pub/sub for flexibility but this doesn't mean that we need to use a MQ to achieve this.
The mediators really provide a service. They aren't workers. They will spend most of their time polling the queue for messages while the queue is empty in a MQ model. Also, If we use an MQ we will be coupling the mediators to use a MQ in a specific way which may, in turn, couple them to being used with the HIM. If they are independent services they may be more robust for the future.

rcrichton commented 11 years ago

So after thinking about this some more as well as from what was discussed on the IL call, I think we may need to support both direct request-reply calls to mediators as well as a MQ for certain mediators to pull message off. Basically I think that from the HIM interface component should support two transaction types, sync and async. Depending on if it was configured to be a sync or async endpoint would determine if we use a MQ or a direct ws call.

The problem with using a MQ for synchronous client requests is what happen in the failure case. For example, say we are making a query for patients query to the CR. This is a HTTP request from the client so the HIM keeps the connection open so that it can return the response. The HIM puts the message on the MQ and expects the CR to pick it up, but the CR has gone down. The message is now stuck in the queue which is good for fault tolerance but not when we are keeping a connection open with the client! It could be hours until the CR is up again and we don't want to keep the client on hold until its request times out. What we would want to happen is to send a service unavailable error response to the client immediately.

For an asynchronous client message the MQ would work just fine and have the added benefit of being highly fault tolerant but for synchronous request-reply scenarios this doesn't make as much sense.

So, maybe we need to have both option available and depending on the type of client request (sync or async) we choose the mechanism to use.

Right now in RHIE all our workflows are synchronous, so it would make sense to work on the synchronous aspect first and then move to the new async feature.

Also, the fact that we are performing a synchronous request-reply for a client doesn't mean that we cannot perform lookup tasks concurrently in the mediator (ie. resolving ECID, EPID, ELID etc.). That should be up to the mediator to implement efficiently.

This post helped me formulate my opinion: http://stackoverflow.com/questions/2383912/message-queue-vs-web-services

hnnesv commented 11 years ago

Hmmm, I'm not so sure a hybrid approach would be best; we'd be introducing extra complexity without really reaping the benefits that these paradigms can offer (e.g. the sync services wouldn't benefit from "seamless" pub/sub style auditing).

Unfortunately I haven't had a chance to listen to the IL call yet (apologies for missing it, it looked really interesting from the minutes), but here's just a few thoughts for now...

One assumption I have is that we WON'T be changing the way clients interact with the HIM. The restful interfaces are great and should stay as is. From my understanding of this is that we're talking about "internal" HIM communications (X): PoC <--> REST <---> HIM Core <--> X <---> HIM Mediator <---> (rest | soap | plain ol socket | whatever) <--> Registry

If we want an async model for X, then one implication of this is that when the HIM-Core places the message in a queue, it still needs to wait for a response (one way to do this would be for core to subscribe to a response queue and wait for a result, keeping the client connection open). Of course I would hope that the client doesn't set their connection timeout to something in hours! :) (the PoC's currently use 30 seconds by default). I fully agree that this isn't too pretty, but imo the other benefits of a pub/sub model are pretty.

Of course I do agree that we don't need an MQ for pub/sub, but it is easy for pub/sub, http sockets really aren't. Vice versa we can also implement sync with MQs, but that wouldn't be as easy as with http sockets. Also I wasn't trying to argue that we couldn't parallelize certain tasks (ECID, EPID lookups) using sync, of course we can. It's just easy with a pub/sub (actually I think I'm really just liking pub/sub, not MQs specifically :) ).

I do agree that sync makes sense for our use case, but I like async for orchestration, and I particularly love the idea that audit and CDS services can just "listen" in with async (well pub/sub). This would help our modularity cause since mediators need not worry about these actions. Need to add a service for monitoring stats? No problem, just have it subscribe to the queues it's interested in.

(My last point, I swear!) If we do go with sync however then I also just want to make sure that we don't get stuck on http sockets without looking at alternatives (sockets are expensive! If this were a native app I'd be recommending unix domain sockets or shared memory long before http sockets). ZeroMQ (http://zeromq.org/) for example is very high performance (and supports both IPC and TPC socket types, meaning you can rather use IPC if the apps are living on the same box for better performance). But then I guess we also need to use something that's easy to use in mule.

rcrichton commented 11 years ago

Thanks Hannes, and please make an many points as you need :)

Yes, I'm also talking about the X.

I'm in agreement with you that the pub/sub ideas is a good one! I like that for auditing as well (and for many other uses!) When I say we are making direct ws call for synchronous requests, I'm thinking of something more along the lines of a pub/sub model just like the async MQ interface. For example, what I've been thinking of is doing something like the default channel where we search for one or more matching endpoints to send the message to. So for auditing we could add a item in the json config file with urlPattern: ".*" and it would pickup every synchronous message (obviously we will have to have some flag to denote which endpoint will send the reply).

So, basically what I'm saying is this:

Synchronous calls from the client
- Use a pub sub model
- Call subscribing services directly using ws
Asynchronous calls from the client
- Use a pub sub model
- Put message onto a MQ so the heavy workers can chew on the message at their pace

I know sockets can be expensive, but, I think it would be great for us to use HTTP sockets as it gives us flexibility. Flexibility to split the mediators over servers, flexibility for the mediators to be re-used somewhere else for another purpose, flexibility for us to say: "Oh, you want to write another mediators, sure, just write a web service.". Basically we inherit all the good stuff from a SoA.

Here is a diagram of what I'm currently thinking:

(PS. Hannes, enjoy your holiday! Don't let this cloud your thoughts while you are away!)

HIM Arch

hnnesv commented 11 years ago

Great, thanks Ryan. This really helped clarify your thinking. I'm not sure I agree on everything, but I'll have to take some time to fully formulate my responses and give the architecture due consideration. I'm looking forward to taking a relook at this with some fresh, well-rested eyes :) It would definitely be great for us to schedule a call around this at some point!

rcrichton commented 10 years ago

We ended up splitting out the mediators from the core application, they are now hosted in other github repositories.

jembi / openhim-legacy

Do we want to split the mediation functions from the core OpenHIM functionality? #179