Closed jeromegn closed 4 years ago
@jeromegn hmm thanks for the report - In most cases we should automatically be able to restart plugins, and on first look I'm not sure why we couldn't here.
Could you possibly send a full log for the client startup to nomad-oss-debug@hashicorp.com, preferably at debug level or lower? It'll make it much easier for us to be able to track this down.
Thanks!
Sounds good, next time this happens, I'll set it to debug and send you logs.
This issue will be auto-closed because there hasn't been any activity for a few months. Feel free to open a new one if you still experience this problem :+1:
@jeromebaude I reopened this issue per your message to the mailing list. Are you still running version 0.9.3
or have you since upgraded?
No we’ve been upgrading. We’re on 0.10.2 right now but it’s been happening with every version we tried
On Fri, Jan 31, 2020 at 15:19 Nick Ethier notifications@github.com wrote:
@jeromebaude https://github.com/jeromebaude I reopened this issue per your message to the mailing list. Are you still running version 0.9.3 or have you since upgraded?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hashicorp/nomad/issues/5891?email_source=notifications&email_token=AAAKSPOFFBUKRXLIUP4Z6WTRASBU7A5CNFSM4H3U32WKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKP4GHA#issuecomment-580895516, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAKSPLXGBZFTHVEOEXPYO3RASBU7ANCNFSM4H3U32WA .
Thanks, I'm going to try and replicate this. Have you learned any other conditions that seem to trigger this. Anything to help narrow down the reproduction case? I'm assuming this driver is not publicly available.
It appears pretty random. It doesn't seem to happen on clients with particularly many or few allocs.
The driver is private right now and I'm reluctant to open it in case I mis-committed something at some point! I'll send an email to that oss address with a gist of the source of the most important files.
I've found a broken nomad instance and sent the logs to the same address. I also sent good logs for comparison.
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Nomad version
Nomad v0.9.3 (c5e8b66c3789e4e7f9a83b4e188e9a937eea43ce)
Operating system and Environment details
Ubuntu 18.04.2 LTS
Issue
We've created a custom task driver and under certain conditions (unclear, likely due to an unclean shutdown) it's impossible for nomad to start it.
I confirmed the task driver process is indeed not running.
I browsed the
state.db
and found:Is there any way to just restart the task driver when this happens? When I cleared the state.db, nomad started our task driver just fine.
Reproduction steps
Not entirely sure.
Nomad Client logs (if appropriate)
^ this is nomad trying to reconcile the state for all allocations it knows about.