quantum-elixir / quantum-core

:watch: Cron-like job scheduler for Elixir
https://hexdocs.pm/quantum/
Apache License 2.0
2.3k stars 147 forks source link

Hourly Jobs Doubling and Skipping? #477

Closed mmmries closed 3 years ago

mmmries commented 3 years ago

Hi 👋 and thank you for this awesome contribution to the community.

I have a cluster running in production and just got quantum added with a single hourly job. To help with troubleshooting added a quick Logger.info call at the beginning of the code which is called each hour, and when I checked my logs this morning I'm seeing that in some hours the cron ran twice and would then skip the next hour. I didn't find any similar reported issues, but please let me know if there are known problems that I should be working around in my configuration.

Here is my configuration in the running system:

iex(spiff@10.60.1.79)9> CronScheduler.config()
[
  supervisor_module: Quantum.Supervisor,
  executor_supervisor_name: CronScheduler.ExecutorSupervisor,
  node_selector_broadcaster_name: CronScheduler.NodeSelectorBroadcaster,
  execution_broadcaster_name: CronScheduler.ExecutionBroadcaster,
  job_broadcaster_name: CronScheduler.JobBroadcaster,
  clock_broadcaster_name: CronScheduler.ClockBroadcaster,
  task_registry_name: CronScheduler.TaskRegistry,
  storage_name: CronScheduler.Storage,
  task_supervisor_name: CronScheduler.TaskSupervisor,
  name: CronScheduler,
  scheduler: CronScheduler,
  otp_app: :spiff,
  timeout: 5000,
  schedule: nil,
  state: :active,
  timezone: :utc,
  debug_logging: true,
  storage: Quantum.Storage.Noop,
  overlap: false,
  run_strategy: {Quantum.RunStrategy.Random, :cluster},
  jobs: [
    %Quantum.Job{
      name: #Reference<0.4266881043.898105345.257447>,
      overlap: false,
      run_strategy: %Quantum.RunStrategy.Random{nodes: :cluster},
      schedule: ~e[0 * * * * *],
      state: :active,
      task: {Spiff.DataPusher, :push_daily_looker_updates, []},
      timezone: :utc
    }
  ]
]
iex(spiff@10.60.1.79)10> Node.list()
[:"spiff@10.60.3.61", :"spiff@10.60.3.59", :"spiff@10.60.3.60",
 :"spiff@10.60.0.66", :"spiff@10.60.1.78", :"spiff@10.60.0.68",
 :"spiff@10.60.1.80", :"spiff@10.60.0.70"]

And here is what I see in my logs

Screen Shot 2021-03-23 at 9 01 14 AM
maennchen commented 3 years ago

@mmmries Thanks for the report. Could you enable debug logging so that I can see what is happening internally?

https://github.com/quantum-elixir/quantum-core#troubleshooting

mmmries commented 3 years ago

@maennchen thanks for the quick reply. I don't think I can enable debug logging for the entire app because this app does a lot of ecto queries and logging them all would kill my logging infrastructure. I'll see if I can find a good way to disable debug logging in some of these other libraries so I can enable it for Quantum

One other thought I had that might be relevant is that this app is joined into a cluster where 3 of the members are all running the same codebase, but 6 of the other members are running a different codebase. I do this so that I can share a Phoenix.PubSub messages between them, but I thought I would mention it in case those types of heterogenous clusters are problematic for Quantum.

mmmries commented 3 years ago

I'll go ahead a close this, I didn't find a good way to get the debug logging going and since we're doing some weird things with our cluster of nodes I think it may just be outside the scope of Quantum for now. We'll use an external cron mechanism for now. Thanks again for your effort and support in the community ❤️ 💛 💙 💚 💜