Closed juliev0 closed 1 month ago
Hey @chandankumar4 - I unassigned this from you. Instead, I'll try running it again and since Sidhant has now run our e2e himself locally, he could be the one to look at it if it's occurring.
Just re-ran this locally. It's after we update the MonoVertexRollout that the Monovertex is in a crash loop with this error:
jvogelman@macos-VF3V14X2QJ controller % k logs test-monovertex-rollout-mv-0-x4p8j
2024-10-15T16:03:04.621585Z INFO monovertex::server_info: Server info file: ServerInfo { protocol: "uds", language: "java", minimum_numaflow_version: "", version: "0.6.0", metadata: Some({}) }
2024-10-15T16:03:04.623577Z INFO monovertex::server_info: Version_info: VersionInfo { version: "latest+unknown", build_date: "1970-01-01T00:00:00Z", git_commit: "", git_tag: "", git_tree_state: "", go_version: "unknown", compiler: "", platform: "linux/x86_64" }
2024-10-15T16:03:04.623761Z WARN monovertex::server_info: Failed to get the minimum numaflow version, skipping numaflow version compatibility check
2024-10-15T16:03:04.625997Z WARN monovertex::startup: Error waiting for source server info file: ServerInfoError("SDK version 0.6.0 must be upgraded to at least 0.8.0, in order to work with the current numaflow version")
2024-10-15T16:03:04.626288Z ERROR monovertex: Application error: ForwarderError("Error waiting for server info file")
2024-10-15T16:03:04.626458Z INFO monovertex: Gracefully Exiting...
Hey @dpadhiar - not super high priority, but would be good to fix the e2e test so that after updating MonoVertexRollout, the MonoVertex Pod is not in a crash loop (see log above)
Hey @dpadhiar - not super high priority, but would be good to fix the e2e test so that after updating MonoVertexRollout, the MonoVertex Pod is not in a crash loop (see log above)
I see, looks like the version I change the upgrade to (from stable to 0.6.0) causes an issue. Will change that soon.
Describe the bug I'm not sure that this is any issue on our side, but would be worth investigating. Ultimately, could be something to hand over to Numaflow team to look at after some analysis on our side.
I was seeing that the MonoVertex pod was in a crash loop at the very end of the e2e test. I'm not sure if it's consistent or not, but I've seen it more than once. (Perhaps it's okay and it eventually fixes itself?)
This is the CI log from the test I ran locally: ci.log.txt These are the outputs from
tests/e2e/outputs
directory: output.zipIf you look at
outputs/resources/monovertexrollouts/pods
you can see many Pods in there, which seems to indicate that the Pods restarted a lot.To Reproduce Steps to reproduce the behavior:
I assume this also happens for DATA_LOSS_PREVENTION=false, but I didn't try it.
Message from the maintainers:
Impacted by this bug? Give it a 👍. We often sort issues this way to know what to prioritize.