hashicorp / levant

An open source templating and deployment tool for HashiCorp Nomad jobs
Mozilla Public License 2.0
829 stars 125 forks source link

auto-revert SIGSEGV #426

Open ricochet1k opened 3 years ago

ricochet1k commented 3 years ago

Description

Levant crashes every time I deploy a job with a failing check when it gets to the auto-revert state.

By the way, why does Levant even do auto-revert, since Nomad already returns to a stable working version on its own? Can we add a flag to disable Levant's auto-revert functionality?

2021-09-27T23:58:41Z |INFO| levant/auto_revert: job the-job has entered auto-revert state; launching auto-revert checker job_id=the-job
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x77098e]

goroutine 1 [running]:
github.com/hashicorp/levant/levant.(*levantDeployment).autoRevert(0xc0003fd040, 0xc00011e2f0, 0xc00011e2d0)
    /home/circleci/project/project/levant/auto_revert.go:25 +0x16e
github.com/hashicorp/levant/levant.(*levantDeployment).checkAutoRevert(0xc0003fd040, 0xc00011e2d0)
    /home/circleci/project/project/levant/auto_revert.go:71 +0x186
github.com/hashicorp/levant/levant.(*levantDeployment).deploy(0xc0003fd040, 0x1)
    /home/circleci/project/project/levant/deploy.go:195 +0x505
github.com/hashicorp/levant/levant.TriggerDeployment(0xc0000ccd20, 0x0, 0xc0001696f0)
    /home/circleci/project/project/levant/deploy.go:81 +0x7e
github.com/hashicorp/levant/command.(*DeployCommand).Run(0xc0000aedf8, 0xc0000c0020, 0x3, 0x3, 0xc0000ccb00)
    /home/circleci/project/project/command/deploy.go:197 +0x939
github.com/mitchellh/cli.(*CLI).Run(0xc0000edcc0, 0xc0000edcc0, 0x7, 0xc0000aed08)
    /go/pkg/mod/github.com/mitchellh/cli@v1.1.0/cli.go:260 +0x41a
main.RunCustom(0xc0000c0010, 0x4, 0x4, 0xc000196870, 0x406365)
    /home/circleci/project/project/main.go:49 +0x33e
main.Run(0xc0000c0010, 0x4, 0x4, 0xc000094058)
    /home/circleci/project/project/main.go:17 +0x56
main.main()
    /home/circleci/project/project/main.go:11 +0x65

Relevant Nomad job specification file I don't think this is very relevant, it happens every time.

Output of levant version:

Levant v0.3.0

Output of consul version:

Consul v1.10.0
Revision 27de64da7
Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)

Output of nomad version:

Nomad v1.1.2 (60638a086ef9630e2a9ba1e237e8426192a44244)

Additional environment details:

Debug log outputs from Levant:

rytyr commented 1 year ago

We also have similar issue when enabling auto_revert with hashicorp/levant:0.3.1 docker

2022-12-07T04:19:10Z |INFO| levant/auto_revert: job develop--vas-service has entered auto-revert state; launching auto-revert checker job_id=develop--job
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x713dc4]
goroutine 1 [running]:
github.com/hashicorp/levant/levant.(*levantDeployment).autoRevert(0xc0003433f0, 0xc00043f2b0, 0xc00043f290)
    /home/circleci/project/project/levant/auto_revert.go:25 +0x[124](https://gitlab.private.domain/deco/vas/vas-service/-/jobs/1831#L124)
github.com/hashicorp/levant/levant.(*levantDeployment).checkAutoRevert(0xc000177bb0, 0xc00043f290)
    /home/circleci/project/project/levant/auto_revert.go:71 +0x129
github.com/hashicorp/levant/levant.(*levantDeployment).deploy(0xc0003433f0)
    /home/circleci/project/project/levant/deploy.go:193 +0x377
github.com/hashicorp/levant/levant.TriggerDeployment(0x7ffc04bfbf70, 0x4)
    /home/circleci/project/project/levant/deploy.go:81 +0x45
github.com/hashicorp/levant/command.(*DeployCommand).Run(0xc00000ce70, {0xc00001e0a0, 0x2, 0x2})
    /home/circleci/project/project/command/deploy.go:197 +0x934
github.com/mitchellh/cli.(*CLI).Run(0xc00022e000)
    /go/pkg/mod/github.com/mitchellh/cli@v1.1.0/cli.go:260 +0x5f8
main.RunCustom({0xc00001e090, 0xc0000f9f70, 0x405d79}, 0xc0001e0960)
    /home/circleci/project/project/main.go:49 +0x26a
main.Run({0xc00001e090, 0x3, 0x3})
    /home/circleci/project/project/main.go:17 +0x45
main.main()
    /home/circleci/project/project/main.go:11 +0x50

Is this normal / expected behavior on doing auto-revert?

cyrilgdn commented 1 year ago

Hi,

I have similar and after looking at the code, I discovered that it happens for job that are not in the default namespace. This call: https://github.com/hashicorp/levant/blob/4cc7f75250a989edae097fb844112e42ca1fd6e8/levant/auto_revert.go#L17 does not pass the namespace so it doesn't find the deployment (and dep is nil).

I have a fix and will try to create a PR, I'm not sure it will be merged though, Levant development seems stopped (I think they focus more on Nomad CLI / nomad-pack IIRC). I switched on custom releases from my personal fork (to have features like HCL2 support from #398 or this kind of fixes) as Levant is still a very convenient tool with a better deployment experience than Nomad CLI currently (specifically for deployments from CI/CD).