oban-bg / oban

💎 Robust job processing in Elixir, backed by modern PostgreSQL and SQLite3
https://oban.pro
Apache License 2.0
3.37k stars 314 forks source link

Possible regression when handling unique advisory lock #1112

Closed michelboaventura closed 5 months ago

michelboaventura commented 5 months ago

Precheck

Environment

Current Behavior

Hey!

I have this simplified version of a worker on my project. The important part is the create method:

defmodule Worker do
  use Oban.Worker,
    queue: :index,
    unique: [
      # for the next 2 seconds, starting on job's insertion time
      period: 2,
      # check for each incoming job, whether or not worker and args are the same
      fields: [:args, :worker],
      # from the args, use id, action and resource
      keys: [:id, :action, :resource],
      states: [:available, :scheduled]
      # then, replace the current persisted job `scheduled_at` by the incoming job.scheduled_at
      # replace: [scheduled: [:scheduled_at]]
    ]

  require Logger

  @spec create(number) :: {:ok, Oban.Job.t()} | {:error, any}
  def create(id) do
    %{id: id}
    |> new(replace: [scheduled: [:scheduled_at]], schedule_in: 1)
    |> Oban.insert()
    |> case do
      {:ok, job} ->
        {:ok, job}

      {:error, error} ->
        Logger.notice("Failed to create Worker job for '#{id}': #{inspect(error)}")
        {:error, error}
    end
  end
end

Recently we've improved our visibility and started sending all sort of errors to Sentry. After that we started seeing a ton of errors on the case statement within create similar to (I've unwrapped the error in a more legible format):

# CaseClauseError. No case clause matching:
%Oban.Job{
  id: nil,
  state: "scheduled",
  queue: "index",
  worker: "Worker",
  args: %{id: 12345},
  meta: %{},
  tags: [],
  errors: [],
  attempt: 0,
  attempted_by: nil,
  max_attempts: 20,
  priority: nil,
  attempted_at: nil,
  cancelled_at: nil,
  completed_at: nil,
  discarded_at: nil,
  inserted_at: nil,
  scheduled_at: ~U[2024-06-27 20:15:06.514996Z],
  conf: nil,
  conflict?: true,
  replace: [scheduled: [:scheduled_at]],
  unique: %{
    timestamp: :inserted_at,
    keys: [:id, :action, :resource],
    period: 2,
    fields: [:args, :worker],
    states: [:available, :scheduled]
  },
  unsaved_error: nil
}

According to Oban's doc and typespec Oban.insert/1 should never return anything other than {:ok, job} or {:error, error} which makes this behavior odd. After looking at 400+ errors the only thing in common they have is the conflict?: true flag, so after reading throughout Oban's source code I came across this line which seems to be returning only the job and only when a conflict happens. And since this line was changed a couple of days ago I believe this might be a regression.

Expected Behavior

Oban.insert() should only return {:ok, job} or {:error, error}.

sorentwo commented 5 months ago

Thanks @michelboaventura, that was indeed a regression. Dialyzer didn't help here and that condition is prohibitively difficult to replicate in tests because of the sandbox.

michelboaventura commented 5 months ago

Thanks @sorentwo. Indeed. I tried reproduce it locally running a bunch of async tasks but was never able to do so.

michelboaventura commented 5 months ago

@sorentwo not trying to rush anything but do you think it would be possible to cut a new release fixing this today? Thanks

sorentwo commented 5 months ago

@michelboaventura Done! Oban v2.17.12 is out with this specific fix.