Open sbalmos opened 1 year ago
👋 Thanks for the issue. This duplicates https://github.com/vectordotdev/vector/issues/7198 but I think it might be more complete than the existing issue. @jszwedko which do you think we should keep open?
I closed the other as a duplicate since this issue has more information. Thanks for filing @sbalmos !
@jszwedko did something change, even without --no-environment
flag we are not seeing any errors related to VRL syntax/expressions. Only workaround I can think of is to have a one test case and run vector test
along with validate
When can this be prioritised? It's a blocker for us to adopt Vector in our org as we offer vector as a self-service for exclusion filtering, which means our devs independently make changes and once merge it's deployed to all clusters. Not having ability to validate VRL expressions in CI would mean if there is an invalid expression and PR is merged Vector would crash.
It should compile VRL programs when not using --no-environment
. I just tested this again with 0.30.0 and it seems to work. If it isn't for you, could you open a bug report?
Unfortunately it still doesn't validate VRL programs when running with --no-environment
, for now.
Hey @jszwedko,
You are right. My bad, without --no-environment
it does check VRL expression
Some more challenges of not having VRL expression not being compiled when --no-environment
flag is set are:
We currently use vector validate
in an internal k8s operator to validate the admission webhook. When two admission review requests come in parallel, ports
and data_dir
conflict, causing validation to fail. Example:
x Source \"datadog_agents\": TCP bind failed: Address already in use (os error 98)
2023-08-24T14:33:58.240518Z ERROR vector::validate: Failed to remove temporary directory. path=\"/vector-data-dir/validate_tmp\" error=No such file or directory (os error 2)
"} {"error": "failed to validate vector configuration: {\"exit_code\":78,\"original_error\":{\"Stderr\":null},\"vector_validate_output\":\"Loaded with \\nComponent errors\\n----------------\\nx Source \\\"datadog_agents\\\": TCP bind failed: Address already in use (os error 98)\\n\\n2023-08-24T14:33:58.240518Z ERROR vector::validate: Failed to remove temporary directory. path=\\\"/vector-data-dir/validate_tmp\\\" error=No such file or directory (os error 2)\\n\"}"}\
Currently, we worked around this by randomizing these values during validation, but it would be nice if we didn't have to do these hacks.
I just put it out there, if it helps to make a case for prioritisation.
A note for the community
Use Cases
Vector validate --no-environment should perform VRL syntax checks of transforms and filters, to catch VRL errors in CI/CD-based config generation scenarios.
Attempted Solutions
A partial workaround exists as discussed in https://github.com/vectordotdev/vector/discussions/13726. However, this only covers a hacky rewrite of sinks, not sources, and also requires the installation of the yq utility. The idea of having vector validate running against a whole Helm chart file is out of scope - this is merely a reference general idea/workaround.
Proposal
Currently, performing a validate --no-environment does very basic validations of the provided config. Notably, it does not perform validation of any transform VRL or filter conditions. This seems to be a critical gap in CI/CD situations. For reference, I generate the Vector config from templates in Ansible, save to a temporary file, and attempt to perform a vector validate --no-environment to sanity/syntax check the file before the config contents are embedded into a Helm chart values file. The --no-environment flag performed no syntax checking, leaving an any-to-string type mismatch error in a transform until the Helm chart was deployed, causing a pod error.
--no-environment should bypass environmental checks (env vars, connecting to sources/sinks, etc), but still perform VRL validation.
References
No response
Version
0.24.2