Closed suikast42 closed 1 year ago
Hi @suikast42 👋
I was able to reproduce this with the nomad job init -short -connect
example, but I believe this was fixed in https://github.com/hashicorp/nomad/pull/15407, which will be out in the next release of Nomad.
If you are able to compile from main
and test if the fix works in your case that would be awesome, otherwise we can wait until the release it out.
I will close one for now, but let us know if the problem still happens in the new version and we reopen it 🙂
If you are able to compile from
main
and test if the fix works in your case that would be awesome, otherwise we can wait until the release it out.
Sure I can do it. It is enough to checkout the main branch and run go build ? 😜
With a little introduction to nomad build I can test it.
If you are able to compile from
main
and test if the fix works in your case that would be awesome, otherwise we can wait until the release it out.
Ok I found it here. https://github.com/hashicorp/nomad/tree/main/contributing. I will try it out.
I ended up with error @percy/cli@1.6.1: The engine "node" is incompatible with this module. Expected version ">=14". Got "12.22.9" error Found incompatible module. make: *** [GNUmakefile:358: ember-dist] Error 1
Ah yeah, it can be tricky to build with the UI stuff 😅
I'm generating a custom binary, it will be available at the bottom of the page here: https://github.com/hashicorp/nomad/actions/runs/4011598465
Just a reminder that these are development binaries and so they should not run in a production environment, so make sure that it doesn't point to any production data.
This is still happening with nomad 1.5.0.rc1 and consul 1.15.0
I can confirm, this happens with nomad 1.5.0 from the debian package repo. Same error message as @suikast42
@lgfa29 please reopen this issue, as the problem seems not to be fixed in v1.5.0
Oh no, sorry to hear that. Re-opening since it's still an issue.
I don't know if that helps for solving this but I observe that nomad Fafter reboot to gather metric from dead containers until I restart the nomad service.
Hum...you mentioned you upgraded to 1.5.0 right? I wonder if this may be related to https://github.com/hashicorp/nomad/pull/16352 🤔
@lgfa29 thanks for reopening the issue :)
I think your theory is correct. After disabling the danging_container
option, as mentioned in #16352 as a workaround, the containers are starting as expected. Nomad 1.5.1 will be released next Monday, so I will test with enabled dangling_containers
feature and get back to you.
@lgfa29 thanks for reopening the issue :)
I think your theory is correct. After disabling the
danging_container
option, as mentioned in #16352 as a workaround, the containers are starting as expected. Nomad 1.5.1 will be released next Monday, so I will test with enabled `dangling_containers' feature and get back to you.
Me too 😜
Thanks for testing. So yeah, I think that was a different problem. 1.5.1 went out today and it should fix this one.
Just upgraded to nomad 1.5.1 and rebooting works as expected. @lgfa29 the nomad_init containers are no longer garbage collected, so I think your suspicion was correct.
From my point of view, this issue can be closed :)
Nice, I'm glad it's all working for you now 🙂
I think the original issue was about something else so I will keep this open until @suikast42 can confirm the original problem has been fixed 👍
I can confirm, too
But figure a out a new problem descirbed here https://github.com/hashicorp/nomad/issues/16453
Cool, thank you for the confirmation. I'm going to close this one.
I seem to be having this error still on 1.5.2. An example error I'm getting after rebooting the server and many jobs don't start back up
failed to setup alloc: pre-run hook "network" failed: failed to configure networking for alloc: failed to configure network: plugin type="bridge" failed (add): failed to allocate for range 0: 172.26.65.119 has been allocated to b4edbaae-3b20-aa1c-400a-787ceff7c636, duplicate allocation is not allowed
I seem to be having this error still on 1.5.2. An example error I'm getting after rebooting the server and many jobs don't start back up
I can confirm. See https://github.com/hashicorp/nomad/issues/16893
Nomad version
1.4.3
Operating system and Environment details
Ubuntu 22.04
Issue
On every (re)boot, the first allocation fails with "duplicate allocation is not allowed"
Reproduction steps
reboot the running client
Expected Result
Allocation should in state run
Actual Result
Allocation should is first at state run -> failed -> run
Job file (if appropriate)
Nomad Server logs (if appropriate)
2023-01-11T17:54:45.764Z [ERROR] client.alloc_runner: prerun failed: alloc_id=174831a8-3c62-ef72-0f3a-c0e7bac072d9 error="pre-run hook \"network\" failed: failed to configure networking for alloc: failed to configure network: plugin type=\"bridge\" failed (add): failed to allocate for range 0: 172.26.68.38 has been allocated to 174831a8-3c62-ef72-0f3a-c0e7bac072d9, duplicate allocation is not allowed"
nomd job status security