Closed mdeng10 closed 2 months ago
@mdeng10 I think I've seen this scenario when running multiple Sync Network Data
Jobs simultaneously and there are Devices in both Jobs in the same Location. To test, try deleting duplicate VLANs from the relevant Location and then re-discover with one Job at a time.
If you restart the worker service, that will stop all running Jobs. systemctl restart autobot-worker
It would be helpful to know the inputs you're passing in to the problematic Jobs.
I am indeed running multiple jobs simultaneously - that must be the cause
is there a way to cancel all pending jobs and not just running jobs? i tried using nautobot-server nbshell on the main host and editing JobResults
to be successful/finished but they're still slowly running one by one
It would be helpful to know the inputs you're passing in to the problematic Jobs.
I'll try to set up the jobs another time after i do some testing
Can you cancel the pending Jobs from the UI and restart the worker service?
i don't see an option to cancel pending jobs from the UI? i can delete the jobresult but i think the job will run all the same
also we have the nautobot worker service in its own docker container hosted via AWS ECS - systemctl isn't installed nor initialised, will terminating the container and creating a new one have the same effect?
After reviewing some docs, I think the most direct way to terminate jobs will be through celery. The celery shell should be accessible from your nautobot app container via a nautobot management command.
nautobot-server celery shell
# Remove all Pending Jobs - I have not tested this
app.control.purge()
# Stop Running Jobs - I just tested this locally
i = inspect()
jobs = i.active()
for hostname in jobs:
tasks = jobs[hostname]
for task in tasks:
app.control.revoke(task['id'], terminate=True)
thanks for the info - what import is needed for inspect()
- i've run
app.control.purge()
so hopefully that clears up the queue
Sorry about that - here you go. More docs about inspecting celery workers
nautobot-server celery shell
i = app.control.inspect()
jobs = i.active()
for hostname in jobs:
tasks = jobs[hostname]
for task in tasks:
app.control.revoke(task['id'], terminate=True)
looks like something worked - not sure which but the jobs seemed to be cancelled (i can create new ones now)
Environment
Expected Behavior
When clicking on the job result I expect it to return the job results page with a log of how the job went
Observed Behavior
It loads this page instead
I've attempted to run the Runs Commands on a Device to simulate SSoT Command Getter job, however there's a large backlog of onboarding jobs that have to run before this one - is there a way to cancel a job in nautobot? I can only delete the job result
Steps to Reproduce
Unsure of how to repro yet - but i suspect if the devices are configured with the same vlan ID, same vlan name, in the same location, it'll most likely throw the error i've been seeing
But it should still allow me to see the job result page so i can try to pinpoint which devices are causing this