vectordotdev / vrl

Vector Remap Language
Mozilla Public License 2.0
137 stars 69 forks source link

The behavior in transformations when I use version 0.40.0 of vector with the del(.) function is returning an empty json #1147

Open madson7 opened 1 day ago

madson7 commented 1 day ago

A note for the community

Problem

The behavior in transformations when I use version 0.40.0 of vector with the del(.) function is returning an empty json:

{ }

Vector Remap Language (VRL) is an expression-oriented language designed for transforming observability data. This playground lets you write a program, run it against an event or events, share it, and see how the events are transformed. Vector Version: 3d16e345 VRL Version: 0.19.0

{
    "_time": "2024-11-25T12:33:51.177455337Z",
    "timestamp": "2024-11-25T12:33:50.176779920Z"
}
log = .timestamp 
del(.)
.log_timestamp = log
{
    "log_timestamp": "2024-11-25T12:33:50.176779920Z"
}

The behavior in transformations when I use version 0.40.0 of vector with the del(.) function is returning an empty json:

{
    "_time": "2024-11-25T12:33:51.177455337Z",
    "timestamp": "2024-11-25T12:33:50.176779920Z"
}
#vector.yaml
transforms:
  msg_parser:
    type: remap
    inputs:
      - docker
    source: |
      log = .timestamp 
      del(.)
      .log_timestamp = log
$ docker run -ti --rm -v $(pwd):/mnt/ docker.io/timberio/vector:0.40.0-distroless-static validate /mnt/vector.yaml
√ Loaded ["/mnt/vector.yaml"]
√ Component configuration
~ Health check disabled for "http"
----------------------------------
                         Validated

Error

{ }

Configuration

api:
  enabled: true
  address: 0.0.0.0:8686

sources:
  docker:
    type: docker_logs

transforms:
  msg_parser:
    type: remap
    inputs:
      - docker
    source: |
      msg_downcase = downcase(string!(.message))
      msg_parse_key_value = parse_key_value!(msg_downcase)
      encode_key_value = encode_key_value(.)
      timestamp_log = to_unix_timestamp(parse_timestamp!(.timestamp, format: "%Y-%m-%dT%H:%M:%S.%fZ"))
      timestamp_parsed = to_unix_timestamp(now())

      namespace = parse_regex!(encode_key_value, r'namespace=(?P<namespace>\w+)', false)
      service_name = parse_regex!(encode_key_value, r'service.name=(?P<service_name>\w+)', false)

      docker = {
        "host": .host,
        "image": .image,
        "container_created_at": .container_created_at,
        "container_name": .container_name
        }
      stack = {
        "namespace": namespace,
        "service_name": service_name
        }

      del(.)

      parse = { 
        "timestamps": {
          "timestamp_log": timestamp_log,
          "timestamp_parsed": timestamp_parsed
        },
        "docker": docker,
        "stack": stack,
        "msg": {
          "msg_downcase": msg_downcase,
          "msg_parse_key_value": msg_parse_key_value
        }
      }

      .log = encode_json(parse)

sinks:
  http:
    type: http
    inputs:
      - msg_parser
    uri: http://9.0.4.206:9428/insert/jsonline?_stream_fields=source_type,host,container_name&_msg_field=log
    encoding:
      codec: json
    framing:
      method: newline_delimited
    compression: gzip
    healthcheck:
      enabled: false
    request:
      headers:
        AccountID: '0'
        ProjectID: '0'

Version

0.40.0

Debug Output

No response

Example Data

[
  {
    "_time": "2024-11-27T16:21:54.869450219Z",
    "_stream_id": "00000000000000006c25093dad8bf2f8c5fae0fdbc0a1630",
    "_stream": "{container_name=\"semaphore_app.1.rvcq0bpzddvlafu7ixwziz6mg\",host=\"bull\",source_type=\"docker_logs\"}",
    "_msg": "missing _msg field; see https://docs.victoriametrics.com/victorialogs/keyconcepts/#message-field",
    "container_created_at": "2024-11-26T19:00:36.349556757Z",
    "container_id": "0b24d9b15fbe1764149ca7bd7bf22fb3fbe386d1e9cec6e632259a7eca5b1469",
    "container_name": "semaphore_app.1.rvcq0bpzddvlafu7ixwziz6mg",
    "host": "bull",
    "image": "semaphoreui/semaphore:126",
    "label.com.docker.stack.namespace": "semaphore",
    "label.com.docker.swarm.node.id": "9mlqety46qllpelesxvxxgint",
    "label.com.docker.swarm.service.id": "o9el4uqn428dh05a7md9spj9m",
    "label.com.docker.swarm.service.name": "semaphore_app",
    "label.com.docker.swarm.task.id": "rvcq0bpzddvlafu7ixwziz6mg",
    "label.com.docker.swarm.task.name": "semaphore_app.1.rvcq0bpzddvlafu7ixwziz6mg",
    "label.maintainer": "Semaphore UI <support@semui.co>",
    "label.org.opencontainers.image.created": "2024-10-21T11:11:01.004Z",
    "label.org.opencontainers.image.description": "Modern UI and powerful API for Ansible, Terraform, OpenTofu, PowerShell and other DevOps tools.",
    "label.org.opencontainers.image.licenses": "MIT",
    "label.org.opencontainers.image.revision": "f33944e0429f711b24e03600a0c1dab7460b3a2d",
    "label.org.opencontainers.image.source": "https://github.com/semaphoreui/semaphore",
    "label.org.opencontainers.image.title": "semaphore",
    "label.org.opencontainers.image.url": "https://github.com/semaphoreui/semaphore",
    "label.org.opencontainers.image.vendor": "SemaphoreUI",
    "label.org.opencontainers.image.version": "v2.10.32",
    "source_type": "docker_logs",
    "stream": "stdout",
    "timestamp": "2024-11-27T16:21:53.868379720Z"
  }
]

Additional Context

I'm facing a problem when using VRL (Vector Remap Language) in Vector. In the playground, I can perform all the desired operations, save the results in variables, and then apply del(.) to clean the previous data in the JSON, leaving only what was saved in the variables. Finally, I save the cleaned results in .log, and everything works as expected.

However, when I apply this logic in the transforms block of the vector.yaml file, the behavior is different. When I use del(.), the expected data is not saved in .log, that is, the JSON seems to be completely cleaned and the .log is empty.

If I remove the del(.), I notice that all the operations were performed correctly, but the old data remains in the JSON, without being replaced or overwritten. I also tried an alternative approach, using . = .log without del(.), but I was unsuccessful.

I would like to understand what is causing this difference between the playground and Vector, and what would be the best practice to clean the original JSON and save only the content of the variables processed in .log in vector.yaml.

References

No response

madson7 commented 12 hours ago
    environment:
      - VECTOR_LOG=debug

Debug Output

vector_vector.0.yanc80sdyk5i@bull    | 2024-11-28T04:23:02.116824Z DEBUG hyper::proto::h1::io: flushed 1849 bytes
vector_vector.0.yanc80sdyk5i@bull    | 2024-11-28T04:23:02.118486Z DEBUG hyper::proto::h1::io: parsed 5 headers
vector_vector.0.yanc80sdyk5i@bull    | 2024-11-28T04:23:02.118534Z DEBUG hyper::proto::h1::conn: incoming body is empty
vector_vector.0.yanc80sdyk5i@bull    | 2024-11-28T04:23:02.118602Z DEBUG sink{component_kind="sink" component_id=http component_type=http}:request{request_id=491}:http: hyper::client::pool: pooling idle connection for ("http", 9.0.4.206:9428)
vector_vector.0.yanc80sdyk5i@bull    | 2024-11-28T04:23:02.118643Z DEBUG sink{component_kind="sink" component_id=http component_type=http}:request{request_id=491}:http: vector::internal_events::http_client: HTTP response. status=200 OK version=HTTP/1.1 headers={"content-type": "application/json", "vary": "Accept-Encoding", "x-server-hostname": "victorialogs-\"manager02\"", "date": "Thu, 28 Nov 2024 04:23:02 GMT", "content-length": "0"} body=[empty]
vector_vector.0.yanc80sdyk5i@bull    | 2024-11-28T04:23:02.511330Z DEBUG sink{component_kind="sink" component_id=http component_type=http}: vector::utilization: utilization=0.000028915188686934943
vector_vector.0.yanc80sdyk5i@bull    | 2024-11-28T04:23:03.124428Z DEBUG sink{component_kind="sink" component_id=http component_type=http}:request{request_id=492}:http: vector::internal_events::http_client: Sending HTTP request. uri=http://9.0.4.206:9428/insert/jsonline?_stream_fields=source_type,host,container_name&_msg_field=log method=POST version=HTTP/1.1 headers={"content-type": "application/x-ndjson", "content-encoding": "gzip", "accountid": "0", "projectid": "0", "accept-encoding": "zstd,gzip,deflate,br", "user-agent": "Vector/0.42.0 (x86_64-unknown-linux-musl 3d16e34 2024-10-21 14:10:14.375255220)"} body=[1452 bytes]
vector_vector.0.yanc80sdyk5i@bull    | 2024-11-28T04:23:03.124523Z DEBUG sink{component_kind="sink" component_id=http component_type=http}:request{request_id=492}:http: hyper::client::pool: reuse idle connection for ("http", 9.0.4.206:9428)
vector_vector.0.yanc80sdyk5i@bull    | 2024-11-28T04:23:03.124818Z DEBUG hyper::proto::h1::io: flushed 1812 bytes
vector_vector.0.yanc80sdyk5i@bull    | 2024-11-28T04:23:03.128297Z DEBUG hyper::proto::h1::io: parsed 5 headers
vector_vector.0.yanc80sdyk5i@bull    | 2024-11-28T04:23:03.128495Z DEBUG hyper::proto::h1::conn: incoming body is empty
vector_vector.0.yanc80sdyk5i@bull    | 2024-11-28T04:23:03.128596Z DEBUG sink{component_kind="sink" component_id=http component_type=http}:request{request_id=492}:http: hyper::client::pool: pooling idle connection for ("http", 9.0.4.206:9428)
vector_vector.0.yanc80sdyk5i@bull    | 2024-11-28T04:23:03.128644Z DEBUG sink{component_kind="sink" component_id=http component_type=http}:request{request_id=492}:http: vector::internal_events::http_client: HTTP response. status=200 OK version=HTTP/1.1 headers={"content-type": "application/json", "vary": "Accept-Encoding", "x-server-hostname": "victorialogs-\"manager02\"", "date": "Thu, 28 Nov 2024 04:23:03 GMT", "content-length": "0"} body=[empty]
vector_vector.0.yanc80sdyk5i@bull    | 2024-11-28T04:23:04.131990Z DEBUG sink{component_kind="sink" component_id=http component_type=http}:request{request_id=493}:http: vector::internal_events::http_client: Sending HTTP request. uri=http://9.0.4.206:9428/insert/jsonline?_stream_fields=source_type,host,container_name&_msg_field=log method=POST version=HTTP/1.1 headers={"content-type": "application/x-ndjson", "content-encoding": "gzip", "accountid": "0", "projectid": "0", "accept-encoding": "zstd,gzip,deflate,br", "user-agent": "Vector/0.42.0 (x86_64-unknown-linux-musl 3d16e34 2024-10-21 14:10:14.375255220)"} body=[1386 bytes]
vector_vector.0.yanc80sdyk5i@bull    | 2024-11-28T04:23:04.132093Z DEBUG sink{component_kind="sink" component_id=http component_type=http}:request{request_id=493}:http: hyper::client::pool: reuse idle connection for ("http", 9.0.4.206:9428)