sheharyarn / que

Simple Job Processing in Elixir with Mnesia :zap:
MIT License
669 stars 34 forks source link

perform completes before monitor is installed causing failure #11

Open tomjoro opened 5 years ago

tomjoro commented 5 years ago

If I have a trivial Worker i.e. it completes really fast, then sometimes it will fail, when really it is a success. Here's my worker:

  defmodule NormalWorker do
    use Que.Worker, concurrency: 4

    def perform(args) do
      IO.puts "perform "
    end
    def on_success(args) do
      IO.puts "success"
    end
    def on_failure(args, _err) do
      IO.puts "failure"
    end
  end

I think the problem is the worker process already completes before the Process.monitor is installed. In this case Erlang sends DOWN with info set to :noproc (and that's the error seen in failure too, since failure is handled by the second DOWN handling function in lib/que/server.ex

  def handle_info({:DOWN, ref, :process, _pid, :normal}, queue) do
  def handle_info({:DOWN, ref, :process, _pid, err}, queue) do

I fixed it by add this function (between the two above) and then handling it as success. def handle_info({:DOWN, ref, :process, _pid, :noproc}, queue) do Note the :noproc

http://erlang.org/doc/man/erlang.html#monitor-2

I'd submit a pull request, but I don't think this is a good way to fix it, because the Worker might have failed. Is it possible to somehow do the Process.monitor at the same time the Worker is spawned, like spawn_monitor?

tomjoro commented 5 years ago

Sorry, forgot the test case:

  test "bug" do
    capture = ExUnit.CaptureIO.capture_io(fn ->
      1..100
      |> Enum.each(fn i ->
        :ok = Que.add(NormalWorker, {:anything})
      end)
    end)
    assert capture =~ ~r/failure/
  end

You will get result:

  1) test bug (LoadTest)
     test/load_test.exs:28
     Assertion with =~ failed
     code:  assert capture =~ ~r"failure"
     left:  ""
     right: ~r/failure/
     stacktrace:
       test/load_test.exs:35: (test)
sheharyarn commented 5 years ago

Thank you for reporting this issue. I'll look into this and try to come up with a reliable solution. Suggestions and PRs welcome!

lguminski commented 5 years ago

I have just experienced the same problem. Slowing down perform() with

Process.sleep(1000)

temporarily helped.