oban-bg / oban

💎 Robust job processing in Elixir, backed by modern PostgreSQL and SQLite3
https://oban.pro
Apache License 2.0
3.37k stars 313 forks source link

Errors testing MySQL / Dolphin engine #1188

Closed beerlington closed 8 hours ago

beerlington commented 6 days ago

Environment

Current Behavior

I'm starting up my application and am getting two separate errors:

The first occurs just once when the application starts:

[error] Handler "oban.migration" has failed and has been detached. Class=:error
Reason=:function_clause
Stacktrace=[
  {String, :replace, [false, "'", "\\'", []],
   [file: ~c"lib/string.ex", line: 1520]},
  {Oban.Pro.Migration, :with_defaults, 1,
   [file: ~c"lib/oban/pro/migration.ex", line: 248]},
  {Oban.Pro.Migration, :migrated_version, 1,
   [file: ~c"lib/oban/pro/migration.ex", line: 194]},
  {Oban.Pro.Migration, :handle_event, 4,
   [file: ~c"lib/oban/pro/migration.ex", line: 115]},
  {:telemetry, :"-execute/3-fun-0-", 4,
   [
     file: ~c"/Users/me/code/web/apps/my_app/deps/telemetry/src/telemetry.erl",
     line: 167
   ]},
  {:lists, :foreach_1, 2, [file: ~c"lists.erl", line: 1442]},
  {Task.Supervised, :invoke_mfa, 2,
   [file: ~c"lib/task/supervised.ex", line: 101]}
]

The other seems to occur on a regular frequency while the app is running:

[error] GenServer {Oban.Registry, {Oban, Oban.Peer}} terminating
** (ArgumentError) :conflict_target is not supported in insert/insert_all by MySQL
    (ecto_sql 3.12.1) lib/ecto/adapters/myxql/connection.ex:1556: Ecto.Adapters.MyXQL.Connection.error!/2
    (ecto_sql 3.12.1) lib/ecto/adapters/myxql/connection.ex:191: Ecto.Adapters.MyXQL.Connection.insert/7
    (ecto_sql 3.12.1) lib/ecto/adapters/sql.ex:956: Ecto.Adapters.SQL.insert_all/9
    (ecto 3.12.4) lib/ecto/repo/schema.ex:59: Ecto.Repo.Schema.do_insert_all/7
    (oban 2.18.2) lib/oban/peers/database.ex:218: Oban.Peers.Database.upsert_peer/1
    (ecto_sql 3.12.1) lib/ecto/adapters/sql.ex:1400: anonymous fn/3 in Ecto.Adapters.SQL.checkout_or_transaction/4
    (db_connection 2.7.0) lib/db_connection.ex:1756: DBConnection.run_transaction/4
    (oban 2.18.2) lib/oban/repo.ex:156: Oban.Repo.transaction/4
    (oban 2.18.2) lib/oban/peers/database.ex:108: anonymous fn/2 in Oban.Peers.Database.handle_info/2
    (telemetry 1.3.0) /Users/me/code/web/apps/my_app/deps/telemetry/src/telemetry.erl:324: :telemetry.span/3
    (oban 2.18.2) lib/oban/peers/database.ex:101: Oban.Peers.Database.handle_info/2
    (stdlib 4.3) gen_server.erl:1123: :gen_server.try_dispatch/4
    (stdlib 4.3) gen_server.erl:1200: :gen_server.handle_msg/6
    (stdlib 4.3) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
Last message: :election
State: %Oban.Peers.Database{conf: %Oban.Config{dispatch_cooldown: 5, engine: Oban.Engines.Dolphin, get_dynamic_repo: nil, insert_trigger: true, log: false, name: Oban, node: "Peters-Mac-Studio-912", notifier: {Oban.Notifiers.PG, []}, peer: {Oban.Peers.Database, []}, plugins: [{Oban.Plugins.Cron, [crontab: []]}, {Oban.Pro.Plugins.DynamicPruner, [mode: {:max_age, {20, :minute}}, worker_overrides: [<REMOVED>], repo: MyApp.Repo, shutdown_grace_period: 15000, stage_interval: 1000, testing: :disabled}, timer: #Reference<0.2364638267.2795503618.25266>, interval: 30000, leader?: true, leader_boost: 2}
[error] GenServer {Oban.Registry, {Oban, Oban.Peer}} terminating

The only real changes I've made in our app is to switch the engine and disable some Oban Pro features that were giving me trouble.

beerlington commented 5 days ago

One small update to this from yesterday. I've removed Oban Pro from our dependencies and that makes the first error go away. The second one remains without any pro features enabled thought.

sorentwo commented 5 days ago

The second one remains without any pro features enabled thought.

The second one is from the Database peer, which clearly doesn't like the attempted use of conflict_target for upserts. Working on a fix for that one.

sorentwo commented 5 days ago

@beerlington There's a fix for that on main now 🙂

beerlington commented 5 days ago

Thanks! That resolved the second issue. I have it running on my dev machine now and things seem to be working well. I'm not sure what my priority is for testing this further, but I'll report back if we run into any other issues.

beerlington commented 4 days ago

I've got it running on a staging environment now but as soon as I enqueue a job, am getting a flood of these errors:

2024-11-21T19:06:23.560041+00:00 app[web.1]: 19:06:23.557 [error] ** State machine #PID<0.6787.0> terminating
2024-11-21T19:06:23.560054+00:00 app[web.1]: ** Reason for termination ==
2024-11-21T19:06:23.560055+00:00 app[web.1]: ** (DBConnection.ConnectionError) client #PID<0.4979.0> stopped: ** (FunctionClauseError) no function clause matching in MyXQL.Protocol.decode_com_stmt_prepare_response/3
2024-11-21T19:06:23.560056+00:00 app[web.1]: (myxql 0.7.1) lib/myxql/protocol.ex:342: MyXQL.Protocol.decode_com_stmt_prepare_response(<<255, 81, 4, 35, 72, 89, 48, 48, 48, 117, 110, 107, 110, 111, 119, 110, 32, 101, 114, 114, 111, 114, 58, 32, 115, 121, 110, 116, 97, 120, 32, 101, 114, 114, 111, 114, 32, 97, 116, 32, 112, 111, 115, 105, 116, 105, 111, 110, 32, 52, ...>>, <<12, 0, 0, 2, 0, 9, 0, 0, 0, 18, 0, 0, 0, 0, 0, 0, 49, 0, 0, 3, 3, 100, 101, 102, 12, 115, 101, 97, 116, 101, 100, 45, 115, 116, 111, 110, 101, 2, 111, 48, 9, 111, 98, 97, 110, 95, 106, 111, 98, 115, ...>>, :initial)
2024-11-21T19:06:23.560057+00:00 app[web.1]: (myxql 0.7.1) lib/myxql/client.ex:257: MyXQL.Client.recv_packets/7
2024-11-21T19:06:23.560057+00:00 app[web.1]: (myxql 0.7.1) lib/myxql/connection.ex:475: MyXQL.Connection.prepare/2
2024-11-21T19:06:23.560057+00:00 app[web.1]: (myxql 0.7.1) lib/myxql/connection.ex:72: MyXQL.Connection.handle_prepare/3
2024-11-21T19:06:23.560058+00:00 app[web.1]: (db_connection 2.7.0) lib/db_connection/holder.ex:354: DBConnection.Holder.holder_apply/4
2024-11-21T19:06:23.560058+00:00 app[web.1]: (db_connection 2.7.0) lib/db_connection.ex:1543: DBConnection.prepare/4
2024-11-21T19:06:23.560059+00:00 app[web.1]: (db_connection 2.7.0) lib/db_connection.ex:1536: DBConnection.run_prepare/4
2024-11-21T19:06:23.560059+00:00 app[web.1]: (db_connection 2.7.0) lib/db_connection.ex:1548: DBConnection.run_prepare_execute/5
2024-11-21T19:06:23.560060+00:00 app[web.1]: 
2024-11-21T19:06:23.560060+00:00 app[web.1]: 
2024-11-21T19:06:23.560060+00:00 app[web.1]: 19:06:23.558 [error] GenServer {Oban.Registry, {Oban, {:producer, "messages"}}} terminating
2024-11-21T19:06:23.560061+00:00 app[web.1]: ** (FunctionClauseError) no function clause matching in MyXQL.Protocol.decode_com_stmt_prepare_response/3
2024-11-21T19:06:23.560062+00:00 app[web.1]: (myxql 0.7.1) lib/myxql/protocol.ex:342: MyXQL.Protocol.decode_com_stmt_prepare_response(<<255, 81, 4, 35, 72, 89, 48, 48, 48, 117, 110, 107, 110, 111, 119, 110, 32, 101, 114, 114, 111, 114, 58, 32, 115, 121, 110, 116, 97, 120, 32, 101, 114, 114, 111, 114, 32, 97, 116, 32, 112, 111, 115, 105, 116, 105, 111, 110, 32, 52, ...>>, <<12, 0, 0, 2, 0, 9, 0, 0, 0, 18, 0, 0, 0, 0, 0, 0, 49, 0, 0, 3, 3, 100, 101, 102, 12, 115, 101, 97, 116, 101, 100, 45, 115, 116, 111, 110, 101, 2, 111, 48, 9, 111, 98, 97, 110, 95, 106, 111, 98, 115, ...>>, :initial)
2024-11-21T19:06:23.560062+00:00 app[web.1]: (myxql 0.7.1) lib/myxql/client.ex:257: MyXQL.Client.recv_packets/7
2024-11-21T19:06:23.560062+00:00 app[web.1]: (myxql 0.7.1) lib/myxql/connection.ex:475: MyXQL.Connection.prepare/2
2024-11-21T19:06:23.560062+00:00 app[web.1]: (myxql 0.7.1) lib/myxql/connection.ex:72: MyXQL.Connection.handle_prepare/3
2024-11-21T19:06:23.560063+00:00 app[web.1]: (db_connection 2.7.0) lib/db_connection/holder.ex:354: DBConnection.Holder.holder_apply/4
2024-11-21T19:06:23.560063+00:00 app[web.1]: (db_connection 2.7.0) lib/db_connection.ex:1543: DBConnection.prepare/4
2024-11-21T19:06:23.560063+00:00 app[web.1]: (db_connection 2.7.0) lib/db_connection.ex:1536: DBConnection.run_prepare/4
2024-11-21T19:06:23.560063+00:00 app[web.1]: (db_connection 2.7.0) lib/db_connection.ex:1548: DBConnection.run_prepare_execute/5
2024-11-21T19:06:23.560064+00:00 app[web.1]: Last message: :dispatch

I'm not sure if it's useful, but here's a screenshot of what's currently in the oban tables: Screenshot 2024-11-21 at 2 36 43 PM

For these particular jobs, we are inserting them into the oban_jobs table in a JS app. This works fine with Postgres, but maybe it's not compatible with MySQL?

await tx.insert(obanJobs).values({
  args: {
    id: user.id,
    name: 'tour_user_verify_email',
  },
  queue: 'messages',
  worker: 'Elixir.MyApp.EmailWorker',
});

I was able to get these same jobs running in development, so maybe there is something about the Planetscale environment that is incompatible.

beerlington commented 4 days ago

Here's the stacktrace I get locally while connected to the Planetscale database. It's the same as the one I posted above, but with a little more context:

[error] ** State machine #PID<0.8102.0> terminating
** Reason for termination ==
** (DBConnection.ConnectionError) client #PID<0.8285.0> stopped: ** (FunctionClauseError) no function clause matching in MyXQL.Protocol.decode_com_stmt_prepare_response/3
    (myxql 0.7.1) lib/myxql/protocol.ex:342: MyXQL.Protocol.decode_com_stmt_prepare_response(<<255, 81, 4, 35, 72, 89, 48, 48, 48, 117, 110, 107, 110, 111, 119, 110, 32, 101, 114, 114, 111, 114, 58, 32, 115, 121, 110, 116, 97, 120, 32, 101, 114, 114, 111, 114, 32, 97, 116, 32, 112, 111, 115, 105, 116, 105, 111, 110, 32, 52, ...>>, <<12, 0, 0, 2, 0, 20, 0, 0, 0, 18, 0, 0, 0, 0, 0, 0, 49, 0, 0, 3, 3, 100, 101, 102, 12, 115, 101, 97, 116, 101, 100, 45, 115, 116, 111, 110, 101, 2, 111, 48, 9, 111, 98, 97, 110, 95, 106, 111, 98, 115, ...>>, :initial)
    (myxql 0.7.1) lib/myxql/client.ex:257: MyXQL.Client.recv_packets/7
    (myxql 0.7.1) lib/myxql/connection.ex:475: MyXQL.Connection.prepare/2
    (myxql 0.7.1) lib/myxql/connection.ex:72: MyXQL.Connection.handle_prepare/3
    (db_connection 2.7.0) lib/db_connection/holder.ex:354: DBConnection.Holder.holder_apply/4
    (db_connection 2.7.0) lib/db_connection.ex:1543: DBConnection.prepare/4
    (db_connection 2.7.0) lib/db_connection.ex:1536: DBConnection.run_prepare/4
    (db_connection 2.7.0) lib/db_connection.ex:1548: DBConnection.run_prepare_execute/5
    (db_connection 2.7.0) lib/db_connection.ex:772: DBConnection.parsed_prepare_execute/5
    (db_connection 2.7.0) lib/db_connection.ex:764: DBConnection.prepare_execute/4
    (ecto_sql 3.12.1) lib/ecto/adapters/sql.ex:1002: Ecto.Adapters.SQL.execute!/5
    (ecto_sql 3.12.1) lib/ecto/adapters/sql.ex:994: Ecto.Adapters.SQL.execute/6
    (ecto 3.12.4) lib/ecto/repo/queryable.ex:232: Ecto.Repo.Queryable.execute/4
    (ecto 3.12.4) lib/ecto/repo/queryable.ex:19: Ecto.Repo.Queryable.all/3
    (oban 2.18.2) lib/oban/engines/dolphin.ex:114: anonymous fn/3 in Oban.Engines.Dolphin.fetch_jobs/3
    (ecto_sql 3.12.1) lib/ecto/adapters/sql.ex:1400: anonymous fn/3 in Ecto.Adapters.SQL.checkout_or_transaction/4
    (db_connection 2.7.0) lib/db_connection.ex:1756: DBConnection.run_transaction/4
    (oban 2.18.2) lib/oban/repo.ex:156: Oban.Repo.transaction/4
    (oban 2.18.2) lib/oban/engines/dolphin.ex:104: Oban.Engines.Dolphin.fetch_jobs/3
    (oban 2.18.2) lib/oban/engine.ex:234: anonymous fn/4 in Oban.Engine.fetch_jobs/3
beerlington commented 4 days ago

It seems like this error might be an Ecto or MyXQL issue, rather than something specific to Oban. The line that's causing the problem is

|> lock("FOR UPDATE SKIP LOCKED")

If I remove that, jobs get picked and run fine.

I can even run the raw query on Planetscale without any issues:

Screenshot 2024-11-21 at 4 56 01 PM

But when that query runs within Oban, MyXQL does not like the response from Planetscale for some reason.

Even if I run the same query that's working on Planetscale, directly in Ecto:

Repo.query!(
  """
  SELECT *
  FROM oban_jobs
  WHERE state = 'available'
    AND queue = 'event_bus'
    AND attempt < max_attempts
  ORDER BY priority ASC, scheduled_at ASC, id ASC
  LIMIT 100
  FOR UPDATE SKIP LOCKED;
  """
)

I get this error:

** (MyXQL.Error) (1105) unknown error: syntax error at position 179 near 'SKIP'

    query: SELECT *
FROM oban_jobs
WHERE state = 'available'
  AND queue = 'event_bus'
  AND attempt < max_attempts
ORDER BY priority ASC, scheduled_at ASC, id ASC
LIMIT 100
FOR UPDATE SKIP LOCKED;

    (ecto_sql 3.12.1) lib/ecto/adapters/sql.ex:1096: Ecto.Adapters.SQL.raise_sql_call_error/1
sorentwo commented 3 days ago

If I remove that, jobs get picked and run fine.

Removing the FOR UPDATE SKIP LOCKED, or some variant of that syntax, would prevent using multiple nodes to process the same queue. The query appears to work when executed manually (it would have to for the test suite to pass):

iex(2)> Oban.Test.DolphinRepo.query("SELECT * FROM oban_jobs WHERE state = 'available' AND queue = 'event_bus' AND attempt < max_attempts ORDER BY priority ASC, scheduled_at ASC, id ASC LIMIT 100 FOR UPDATE SKIP LOCKED", [])
{:ok,
 %MyXQL.Result{
   columns: ["id", "state", "queue", "worker", "args", "meta", "tags",
    "attempted_by", "errors", "attempt", "max_attempts", "priority",
    "inserted_at", "scheduled_at", "attempted_at", "cancelled_at",
    "completed_at", "discarded_at"],
   connection_id: 2974,
   last_insert_id: nil,
   num_rows: 0,
   rows: [],
   num_warnings: 0
 }}

Could this be a MySQL version issue? Which version of MySQL are you using?

beerlington commented 3 days ago

On Planetscale we're running 8.0.34-Vitess.

Here's a comparison running the same queries directly on Planetscale vs connecting Ecto to Planetscale:

Direct:

Screenshot 2024-11-22 at 8 25 49 AM

Via Ecto:

Screenshot 2024-11-22 at 8 27 09 AM

Another thing to note is that I was originally getting a function match error from MyXQL:

    (myxql 0.7.1) lib/myxql/protocol.ex:342: MyXQL.Protocol.decode_com_stmt_prepare_response(<<255, 81, 4, 35, 72, 89, 48, 48, 48, 117, 110, 107, 110, 111, 119, 110, 32, 101, 114, 114, 111, 114, 58, 32, 115, 121, 110, 116, 97, 120, 32, 101, 114, 114, 111, 114, 32, 97, 116, 32, 112, 111, 115, 105, 116, 105, 111, 110, 32, 49, ...>>, <<12, 0, 0, 2, 0, 1, 0, 0, 0, 18, 0, 0, 0, 0, 0, 0, 56, 0, 0, 3, 3, 100, 101, 102, 12, 115, 101, 97, 116, 101, 100, 45, 115, 116, 111, 110, 101, 9, 111, 98, 97, 110, 95, 106, 111, 98, 115, 9, 111, 98, ...>>, :initial)
    (myxql 0.7.1) lib/myxql/client.ex:257: MyXQL.Client.recv_packets/7
    (myxql 0.7.1) lib/myxql/connection.ex:459: MyXQL.Connection.prepare/2
    (myxql 0.7.1) lib/myxql/connection.ex:72: MyXQL.Connection.handle_prepare/3
    (db_connection 2.7.0) lib/db_connection/holder.ex:354: DBConnection.Holder.holder_apply/4
    (db_connection 2.7.0) lib/db_connection.ex:1543: DBConnection.prepare/4
    (db_connection 2.7.0) lib/db_connection.ex:1536: DBConnection.run_prepare/4
    (db_connection 2.7.0) lib/db_connection.ex:1548: DBConnection.run_prepare_execute/5
    (db_connection 2.7.0) lib/db_connection.ex:1653: DBConnection.run/6
    (db_connection 2.7.0) lib/db_connection.ex:772: DBConnection.parsed_prepare_execute/5
    (db_connection 2.7.0) lib/db_connection.ex:764: DBConnection.prepare_execute/4
    (myxql 0.7.1) lib/myxql.ex:354: MyXQL.do_query/4
    (ecto_sql 3.12.1) lib/ecto/adapters/sql.ex:547: Ecto.Adapters.SQL.query!/4
    (elixir 1.15.4) src/elixir.erl:376: anonymous fn/4 in :elixir.eval_external_handler/1
    (stdlib 4.3) erl_eval.erl:748: :erl_eval.do_apply/7
    (elixir 1.15.4) src/elixir.erl:361: :elixir.eval_forms/4
    (elixir 1.15.4) lib/module/parallel_checker.ex:112: Module.ParallelChecker.verify/1
    (iex 1.15.4) lib/iex/evaluator.ex:331: IEx.Evaluator.eval_and_inspect/3
    (iex 1.15.4) lib/iex/evaluator.ex:305: IEx.Evaluator.eval_and_inspect_parsed/3
    (iex 1.15.4) lib/iex/evaluator.ex:294: IEx.Evaluator.parse_eval_inspect/4

To see what the actual error was, I had to add a function to lib/myxql/protocol.ex in the MyXQL dep:

def decode_com_stmt_prepare_response(<<rest::binary>>, _next, :initial) do
  {:halt, decode_generic_response(rest)}
end

I still feel like at this point it's not an Oban-specific issue. There must be a reason why the query works directly on Planetscale and not through Ecto, when all other queries I've run are successful. It's also strange that MyXQL is not expecting the response and I had to add my own function to handle this error case. I'm going to open an issue on that end and see if they have any thoughts.

beerlington commented 3 days ago

Here is the corresponding issue I opened on the MyXQL project https://github.com/elixir-ecto/myxql/issues/199

beerlington commented 8 hours ago

I'm able to reproduce this issue on other clients connecting to PlanetScale, so it's definitely not an Oban issue. I opened a support ticket with PlanetScale to try and get to the bottom of it and will continue my testing when I get a resolution on that end.