vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.42k stars 1.51k forks source link

Validate no-environment should perform transform/filter VRL checks #15037

Open sbalmos opened 1 year ago

sbalmos commented 1 year ago

A note for the community

Use Cases

Vector validate --no-environment should perform VRL syntax checks of transforms and filters, to catch VRL errors in CI/CD-based config generation scenarios.

Attempted Solutions

A partial workaround exists as discussed in https://github.com/vectordotdev/vector/discussions/13726. However, this only covers a hacky rewrite of sinks, not sources, and also requires the installation of the yq utility. The idea of having vector validate running against a whole Helm chart file is out of scope - this is merely a reference general idea/workaround.

Proposal

Currently, performing a validate --no-environment does very basic validations of the provided config. Notably, it does not perform validation of any transform VRL or filter conditions. This seems to be a critical gap in CI/CD situations. For reference, I generate the Vector config from templates in Ansible, save to a temporary file, and attempt to perform a vector validate --no-environment to sanity/syntax check the file before the config contents are embedded into a Helm chart values file. The --no-environment flag performed no syntax checking, leaving an any-to-string type mismatch error in a transform until the Helm chart was deployed, causing a pod error.

--no-environment should bypass environmental checks (env vars, connecting to sources/sinks, etc), but still perform VRL validation.

References

No response

Version

0.24.2

spencergilbert commented 1 year ago

👋 Thanks for the issue. This duplicates https://github.com/vectordotdev/vector/issues/7198 but I think it might be more complete than the existing issue. @jszwedko which do you think we should keep open?

jszwedko commented 1 year ago

I closed the other as a duplicate since this issue has more information. Thanks for filing @sbalmos !

smitthakkar96 commented 1 year ago

@jszwedko did something change, even without --no-environment flag we are not seeing any errors related to VRL syntax/expressions. Only workaround I can think of is to have a one test case and run vector test along with validate

smitthakkar96 commented 1 year ago

When can this be prioritised? It's a blocker for us to adopt Vector in our org as we offer vector as a self-service for exclusion filtering, which means our devs independently make changes and once merge it's deployed to all clusters. Not having ability to validate VRL expressions in CI would mean if there is an invalid expression and PR is merged Vector would crash.

jszwedko commented 1 year ago

It should compile VRL programs when not using --no-environment. I just tested this again with 0.30.0 and it seems to work. If it isn't for you, could you open a bug report?

Unfortunately it still doesn't validate VRL programs when running with --no-environment, for now.

smitthakkar96 commented 1 year ago

Hey @jszwedko,

You are right. My bad, without --no-environment it does check VRL expression

smitthakkar96 commented 1 year ago

Some more challenges of not having VRL expression not being compiled when --no-environment flag is set are:

We currently use vector validate in an internal k8s operator to validate the admission webhook. When two admission review requests come in parallel, ports and data_dir conflict, causing validation to fail. Example:

x Source \"datadog_agents\": TCP bind failed: Address already in use (os error 98)

2023-08-24T14:33:58.240518Z ERROR vector::validate: Failed to remove temporary directory. path=\"/vector-data-dir/validate_tmp\" error=No such file or directory (os error 2)
"}  {"error": "failed to validate vector configuration: {\"exit_code\":78,\"original_error\":{\"Stderr\":null},\"vector_validate_output\":\"Loaded with \\nComponent errors\\n----------------\\nx Source \\\"datadog_agents\\\": TCP bind failed: Address already in use (os error 98)\\n\\n2023-08-24T14:33:58.240518Z ERROR vector::validate: Failed to remove temporary directory. path=\\\"/vector-data-dir/validate_tmp\\\" error=No such file or directory (os error 2)\\n\"}"}\

Currently, we worked around this by randomizing these values during validation, but it would be nice if we didn't have to do these hacks.

I just put it out there, if it helps to make a case for prioritisation.