motiv-labs / janus

An API Gateway written in Go
https://hellofresh.gitbooks.io/janus
MIT License
2.79k stars 317 forks source link

Janus infrequently dumps core on startup (seen in container environments) #348

Closed amukherj closed 6 years ago

amukherj commented 6 years ago

Occasionally on startup, Janus dumps core and exits. This is particularly observed in container environments but occasionally also seen with the standalone binary.

Reproduction Steps:

Create a docker image using the following Dockerfile, running in a directory with the following layout:

dist/janus_linux-amd64 apis/apispec.json conf/janus.toml certs/janus.crt certs/janus.key


RUN mkdir -p /api /etc/janus
COPY apis /api/apis
COPY conf/janus.toml dist/janus_linux-amd64 scripts/entrypoint.sh /api/
COPY certs/janus.key certs/janus.crt /etc/janus/
EXPOSE 443
EXPOSE 8433
EXPOSE 8444
CMD ["/api/janus_linux-amd64", "-c", "/api/janus.toml"]

The above Dockerfile is for informational purposes only.

Expected behavior:

Janus should come up and start serving the configured endpoints.

Observed behavior:

Normally Janus works as expected, but occasionally it dumps the following core.

On startup, Janus dumps the following core:

goroutine 1 [running]:
github.com/apporbit/api-gateway/vendor/github.com/hellofresh/janus/pkg/loader.(*APILoader).RegisterAPI(0x0, 0xc4203fe3c0)
        /go/src/github.com/apporbit/api-gateway/vendor/github.com/hellofresh/janus/pkg/loader/api_loader.go:70 +0x4bb

github.com/apporbit/api-gateway/vendor/github.com/hellofresh/janus/pkg/loader.(*APILoader).RegisterAPIs(0x0, 0xc4203dd520, 0x3, 0x4)
        /go/src/github.com/apporbit/api-gateway/vendor/github.com/hellofresh/janus/pkg/loader/api_loader.go:24 +0xef

github.com/apporbit/api-gateway/vendor/github.com/hellofresh/janus/pkg/server.(*Server).StartWithContext(0xc420409ea0, 0x12dd7a0, 0xc4203dea80, 0xc4203dea80, 0xc420400410)
        /go/src/github.com/apporbit/api-gateway/vendor/github.com/hellofresh/janus/pkg/server/server.go:113 +0x479

github.com/apporbit/api-gateway/vendor/github.com/hellofresh/janus/cmd.RunServerStart(0x12dd7e0, 0xc420014028, 0xc4203f86ae, 0x0, 0x0)
        /go/src/github.com/apporbit/api-gateway/vendor/github.com/hellofresh/janus/cmd/server.go:83 +0x3ff

github.com/apporbit/api-gateway/vendor/github.com/hellofresh/janus/cmd.NewServerStartCmd.func1(0xc420425400, 0xc4203dc9e0, 0x0, 0x2, 0x0, 0x0)
        /go/src/github.com/apporbit/api-gateway/vendor/github.com/hellofresh/janus/cmd/server.go:44 +0x3c

github.com/apporbit/api-gateway/vendor/github.com/spf13/cobra.(*Command).execute(0xc420425400, 0xc4203dc960, 0x2, 0x2, 0xc420425400, 0xc4203dc960)
        /go/src/github.com/apporbit/api-gateway/vendor/github.com/spf13/cobra/command.go:762 +0x475

github.com/apporbit/api-gateway/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc420424c80, 0xc420425180, 0xc420424f00, 0xc420014028)
        /go/src/github.com/apporbit/api-gateway/vendor/github.com/spf13/cobra/command.go:852 +0x334

github.com/apporbit/api-gateway/vendor/github.com/spf13/cobra.(*Command).Execute(0xc420424c80, 0xc420056058, 0x0)
        /go/src/github.com/apporbit/api-gateway/vendor/github.com/spf13/cobra/command.go:800 +0x2b
main.main()
        /go/src/github.com/apporbit/api-gateway/cmd/janus/main.go:13 +0x2b

Janus version: 3.7.1-rc.24 OS and version: CentOS Linux release 7.4.1708 (Core)

amukherj commented 6 years ago

My analysis suggests the following:

This line:

https://github.com/hellofresh/janus/blob/3.7.1-rc.24/pkg/server/server.go#L113

expects the register member of s.defLoader to be initialized by an earlier call to the goroutine here:

https://github.com/hellofresh/janus/blob/3.7.1-rc.24/pkg/server/server.go#L83

which ultimately calls this line:

https://github.com/hellofresh/janus/blob/3.7.1-rc.24/pkg/server/server.go#L163

But there is no synchronization to guarantee a happens-before relationship, and on the rare occasion that this order is flipped, we get the core dump.

arindam2 commented 6 years ago

@vgarvardt @italolelis Any thoughts on this?

vgarvardt commented 6 years ago

Thank you for the report. I'll take a look at it as soon as I can.

vgarvardt commented 6 years ago

@amukherj I just merged https://github.com/hellofresh/janus/pull/350 that should fix the issue. This issue seems to be continuation/leftover after fix from https://github.com/hellofresh/janus/pull/339.

Thank you for your report and investigation!