numaproj / numaflow

Kubernetes-native platform to run massively parallel data/streaming jobs
https://numaflow.numaproj.io/
Apache License 2.0
1.1k stars 112 forks source link

Input vertices had multiple restarts while running load test #899

Open inshbha2 opened 1 year ago

inshbha2 commented 1 year ago

Describe the bug Some of the input vertices had panic and restarted couple of times while running a load test. This was the error seen

panic: Consumer failed with error: 
goroutine 276 [running]:
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc003c729c0, {0xc003d84fc0, 0x1, 0x1})
    /Users/yhl01/go/pkg/mod/go.uber.org/zap@v1.19.1/zapcore/entry.go:232 +0x44c
go.uber.org/zap.(*SugaredLogger).log(0xc0001246a0, 0x4, {0x223775f?, 0x25994a0?}, {0x0?, 0x0?, 0xc000054d70?}, {0xc009275f28, 0x1, 0x1})
[Collapse](https://ip.adhoc.splunk.intuit.com/en-US/app/search/search?q=search%20(index%3Do11y-gs)%20((sourcetype%3Deventrouter%20AND%20event.involvedObject.namespace%3Ddev-devx-o11yabstractgs-usw2-stg)%20OR%20(%20kubernetes_namespace%3Ddev-devx-o11yabstractgs-usw2-stg))%20kubernetes_cluster%3Dip-msaas-prod-usw2-k8s%20error%20OR%20exception%20NOT%20%22nats%3A%20no%20keys%20found%22&display.page.search.mode=verbose&dispatch.sample_ratio=1&earliest=1690397288&latest=1690397298.001&sid=1690398079.58771_0E1F1EAB-1964-48F9-8F1D-DA9AE24C48BF#)

To Reproduce Steps to reproduce the behavior:

  1. Have minimum of 1 pod for source vertex, no max
  2. Inject load (the load when this error occurred was 20k tps)

Expected behavior No restart was expected

Environment (please complete the following information):

Additional context Add any other context about the problem here.


Message from the maintainers:

Impacted by this bug? Give it a 👍. We often sort issues this way to know what to prioritize.

vigith commented 1 year ago

looks like in the logger, we are referencing a nil