intentionally-left-nil / yatlab-worker

Slack client for yatlab
MIT License
0 stars 0 forks source link

I have similar requirements in our Project, is this the correct way of doing this #5

Open trojanh opened 6 years ago

trojanh commented 6 years ago

I have a project where I am starting all the SlackBots for teams and am doing that as below:

Bot.ex

defmodule MyApp.Bot  do
use GenServer
import Logger

def start(_, state), do: {:ok, state}

  def start_link do
    GenServer.start_link(__MODULE__, [:ok], name: __MODULE__)
  end

  def init(state) do
    send(self(), :work)
    {:ok, state}
  end

  @doc """
    invoked by Spervisor worker on start up, to start SlackBot for all the teams in DB
  """
  def handle_info(:work, state) do
    Logger.info("============================")
    Logger.info("Starting bots for following Teams:.....................")
    Logger.info("============================")

    try do
      process_ids = start_all_bots()
      Logger.info("============================")

      if process_ids[:error] do
        Logger.error("#{inspect(process_ids)}")
      else
        Logger.info("#{inspect(process_ids)}")
      end

      Logger.info("============================")

      debug("Done starting bots for all the teams...")
    rescue
      e ->
        Logger.error("#{inspect(e)}")

        Sentry.capture_exception(
          e,
          stacktrace: System.stacktrace(),
          extra: %{message: "Bot crashed"}
        )
    end

    {:stop, :normal, state}
  end

 def start_all_bots() do
    BotSetting
    |> Repo.all()
    |> Enum.map(fn bot ->
      %{team_name: bot.team_name, token: bot.bot_access_token}
      processes = Slack.Bot.start_link(MyApp.SlackBot, [], bot.bot_access_token)

      case processes do
        {:ok, _} ->
          Logger.info("#{bot.team_name}, ")

        {:error, "Could not connect to the Slack RTM API"} ->
          Logger.error(
            "You are offline. Could not start bot for #{bot.team_name}'. \n'Could not connect to the Slack RTM API'"
          )

        {:error, message} ->
          Logger.error("#{bot.team_name} is invalid. Deleting this team now.")

          Logger.error(
            "'Bot Could not start Bot for :#{inspect(message)}'. Deleting this team now."
          )

          Repo.delete!(bot)
      end

      processes
    end)

my_app.ex

defmodule MyApp do
 use Application

def start(_type, _args) do
    import Supervisor.Spec
    # Define workers and child supervisors to be supervised
    children = [
       supervisor(MyApp.Repo, []),

      supervisor(MyApp.Endpoint, []),

      worker(MyApp.Bot, [], restart: :transient)

    ]
    # See http://elixir-lang.org/docs/stable/elixir/Supervisor.html
    # for other strategies and supported options
    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end

This seems to be working fine but most of the it crashes with error

(exit) {:error, :timeout}
gen_fsm.erl in :gen_fsm.terminate/7 at line 602
proc_lib.erl in :proc_lib.init_p_do_apply/3 at line 247
websocket_client in :websocket_client.init/1

I have been digging a lot since past few months with GenServer and Supervisor but nothing worked. I think you are using a similar thing in your Project. You need to start all the team Bots together on start like me. But when this websocket error is thrown it is impossible to trace the logs and cause for crash, which is fine but we have no idea that our Bot is inactive on Slack and only know when a user tells us. I want a way to restart or prevent crash in any circumstances. What approach would you recommend for this? What purpose is DynamicSupervisor serving for you ?

Also is your Slack Bot stable with your design? Does it crash randomly?

Thanks in advanace for understanding this.

intentionally-left-nil commented 6 years ago

Hi, sorry for the delayed response. Basically, I have a two-tiered supervisor system.

The easiest one to explain is the DynamicSupervisor part. Once a slack worker is up and running properly, I add it to the DynamicSupervisor. That way, if it crashes for some transient error, the dynamic supervisor will restart it. It needs to be dynamic because I do not know at start time how many workers I need. As users go through the OAuth flow to add the app, I need to start a new worker. DynamicSupervisor.start_child(Worker.Supervisor, {Worker, team})}

Okay, so this handles any transient errors. However, as you have seen, if the user removes the app from their slack workspace the following happens:

1) The worker process will fail with a :ssl_socket exception 2) The dynamic supervisor will notice the crash and attempt to restart the worker 3) The worker will crash with auth_invalid 4) The dynamic supervisor will notice the crash and attempt to restart the worker. 5) Go to step 2, until the max_restart policy is hit and the dynamic supervisor gives up.

To remedy this, I add my manual supervision on top of it. I leave any of the transient errors up to the DynamicSupervisor, but on initialization, I manually handle certain crashes. Namely, if SlackBot.start_link throws either a invalid_auth or account_inactive error, I return :ignore. This means the worker will not be dynamically supervised, and therefore won't death roll.

So, that solves one problem, but I still want to do something about the app. My solution is to write a bit in my database that says the user has removed the app, and then I won't try to start it again.

Finally, With my TeamMonitor, I poll the database every minute to see which teams I need to start. Since this polling ignores slack workspaces with the bit set above, I won't keep trying to start dead accounts.

The inner tier is left to the DynamicSupervisor. During runtime, I will add bots as users use the OAuth flow to add my app to their workspace. Let's ignore for the moment how they get started

intentionally-left-nil commented 6 years ago

@trojanh from what I can see, the biggest problem you have is that your child slack processes are not under supervision processes = Slack.Bot.start_link(MyApp.SlackBot, [], bot.bot_access_token)

Therefore, if they crash for any reason, you won't know about it and they won't get restarted. However, once you fix that (say with a dynamicsupervisor), you'll run into the auth_invalid problem above, and then can play with the solution I have.

What would be helpful for me would be to know what your requirements are. Do you poll for new teams? What do you do when teams remove your app? That would help me think about how to genericize my code

trojanh commented 6 years ago

You can check following code to start the Bot on server startup, it handles most of the errors and deletes the configurations from DB including the case when the user has removed it from DB. https://gist.github.com/trojanh/0f8491e8930333a5b447282201686de6

Also I provide a screen on UI for viewing list of connected teams and delete them directly from that screen(which will remove the Team tokens and prevent any communication from that team unless it is added again explicitly.

I think we can have chat on this over this slack team personally so that we can get quick replies. https://elixir-lang.slack.com/

trojanh commented 6 years ago

There is also a Slack endpoint to revoke the tokens which i used while deleting the tokens https://api.slack.com/methods/auth.revoke

trojanh commented 6 years ago

I was trying to understand your way of managing slack teams by DynamicSupervisor, correct if I am wrong but this Worker.ex is called in Application.ex as Worker.Supervisor?

I see you have made a Worker file which has the code to start the bots for Team which I suppose is used to start_up all the saved Team bots from DB. I am not able to understand from are you invoking this Module. I see Worker.start_link is called no where. If you could give me some cues for this I will be able to understand how it has been done and avoid Bot crash as you said above.

I am willing to understand how you added them to DynamicSupervisor tree, as it needs to be invoked by Slack.Bot.start_link() but I see you haven't added it anywhere in the tree.

Thanks in advance.