litaio / lita

ChatOps for Ruby.
https://www.lita.io
MIT License
1.68k stars 179 forks source link

Multi process support #150

Closed byroot closed 8 years ago

byroot commented 9 years ago

Context

We are are very happy with Lita, and we are using it very extensively. So much that per moment it receive too much traffic and end up being limited by the CPU (parsing JSON webhooks mostly).

When this happen, HTTP and Chat requests start to be queued and Lita becomes unresponsive. So much that a simple ping can sometimes takes minutes.

Proof of concept

As a PoC I implemented https://github.com/Shopify/lita-external. It's basically an abstract adapter that uses 2 Redis queues as communication mechanism.

Here's what it looks like conceptually:

             +------------+
             |            +-------------+
     +------->  Worker 2  |             |
     |       |            <--------+    |
     |       +------------+        |    |
     |                             |    |
+----+--+    +------------+   +----+----v----+    +-------------+    +------------+
|       |    |            +--->              +---->             |    |  Chat      |
| Nginx +---->  Worker 1  |   |    Redis     |    |   Master    <---->  Service   |
|       |    |            <---+              <----+             |    |            |
+-------+    +------------+   +--------------+    +-------------+    +------------+

Since very recently we are running lita-external in production, without any problems so far (again it's very recent).

Additional benefits

Beyond giving us more CPU capacity and horizontal capacity, it also allow give us:

@jimmycuadra is this something would would like to see in Lita core? Or do you think this should stay an adapter? I know most people probably don't need all this complexity, but if you need only a single process, then just running the master works just as well.

If you don't want to support this use case in core, then maybe we could move the discussion on how the offered API to the adapters could make things like this: https://github.com/Shopify/lita-external/blob/39d2520de2e8ba9ece06743bae1103607d61be11/lib/lita/external.rb#L19-L22 a bit less of a hack.

jimmycuadra commented 9 years ago

Thanks for digging into this so deeply, and for sharing your work and your use case. I haven't had enough time to give this the proper look that it deserves, but I wanted to at least acknowledge it and let you know I've seen it. Without digging into the details, my initial reaction is that this probably doesn't belong in the core, but that the core should make it possible to do with extensions without having to resort to monkey patching or other unsafe hackery.

jordansissel commented 9 years ago

I've implemented this in a way with ruby-concurrent and having routes pass of execution to a work pool -

This is all very much work-in-progress and exploratory code, but it works for me and is the path I'm likely to continue on to allow multiple handlers invoked simultaneously.

byroot commented 9 years ago

@jordansissel not sure I follow. You implementation seems to use a threadpool, which is what Lita already does (at least the multithreading part).

jordansissel commented 9 years ago

@byroot my casual observation was that my lita bot was only capable of doing one thing at a time.

I can reproduce this by doing a simple sleep handler:

      route(/^sleep$/) do |req|
        req.reply("Sleeping")
        sleep(5)
        req.reply("Sleep done")
      end
# output, using Lita's Shell adapter
Jarvis > jarvis sleep
Sleeping
jarvis sleep
jarvis sleep
Sleep done
Jarvis > jarvis sleep
Sleeping
Sleep done
Jarvis > jarvis sleep
Sleeping
Sleep done

From the above, I notice that:

1) Handler execution is serial and single-worker. 2) The adapter doesn't come back until after the handler has completed its execution (notice Jarvis > bot prompt not returning until the sleep finishes.

Am I missing something? The behavior doesn't appear to be multithreaded to me based on this above example and my casual observations in other places, such as a git command run via a custom handler getting stuck (blocked) and that freezes the entire bot. :)

byroot commented 9 years ago

It's the adapter (IRC / Slack / HipChat / Shell / Etc) responsibility to handle the message in a new thread. See https://github.com/kenjij/lita-slack/pull/21 for example.

You seem to be using the shell adapter, so it's kinda normal it doesn't support concurrency since it's for development purpose. But you wouldn't have this issue if you were to use IRC for instance.

jordansissel commented 9 years ago

My experience is that the hipchat adapter does this. The shell example was simply as a demonstration of the effect.

I suppose I will just have to accept that some adapters don't implement threading and continue using my thread pool solution (works for me in shell and hipchat).

On Wednesday, November 18, 2015, Jean Boussier notifications@github.com wrote:

It's the adapter (IRC / Slack / HipChat / Shell / Etc) responsibility to handle the message in a new thread. See kenjij/lita-slack#21 https://github.com/kenjij/lita-slack/pull/21 for example.

You seem to be using the shell adapter, so it's kinda normal it doesn't support concurrency since it's for development purpose. But you wouldn't have this issue if you were to use IRC for instance.

— Reply to this email directly or view it on GitHub https://github.com/litaio/lita/issues/150#issuecomment-157934138.

jimmycuadra commented 8 years ago

Took a look at this again. It's very cool and I want to thank you again for sharing this publicly. My thoughts are the same as my initial reaction, however. I think functionality like this should live outside of the lita gem, but Lita should offer appropriate extension APIs to make implementing it possible without having to hack around anything by bypassing Ruby's visibility rules or anything like that. If there are extension points you can think of that would allow you to do something you can't currently do, or would allow you to improve the code in lita-external, please open new issues for each of them. I'll close this issue for now, as I think I've answered the general question this issue is asking.