Open char8 opened 2 years ago
What's the opentelemetry-proto the envoy extension is using? There's a breaking change in opentelemetry-proto, and vector is using the latest version, so this Maybe caused by proto mismatch. While Opentelemetry-collector itself doesn't use that deprecated field, so vector can recognize its output
What's the opentelemetry-proto the envoy extension is using? There's a breaking change in opentelemetry-proto, and vector is using the latest version, so this Maybe caused by proto mismatch. While Opentelemetry-collector itself doesn't use that deprecated field, so vector can recognize its output
Thanks! - it does indeed look to be this. The patch for Envoy (https://github.com/envoyproxy/envoy/pull/22694) to stop using instrumentation_library_logs
didn't make it into a release yet. I'm building Envoy off the main branch, will test and confirm that it works and resolve this issue.
We need to document the version support (I'll check if there's an issue), and see what we can do to improve backward compatibility 👍
I tested against envoyproxy/envoy-dev:latest
, same behaviour. I double-checked and it's using the 0.19 protos.
There's another oddity where sending SIGTERM to vector doesn't terminate the otel component immediately. Not sure what its waiting for though. That problem doesn't happen when we're writing from the otel collector.
The resulting vector debug logs are similar to before. Each envoy log event shows up in the vector log as:
2022-09-20T08:26:33.773745Z DEBUG Connection{peer=Server}: h2::codec::framed_read: received frame=Data { stream_id: StreamId(1) }
log extract
2022-09-20T08:26:20.213913Z DEBUG source{component_kind="source" component_id=otel component_type=opentelemetry component_name=otel}: vector::topology::builder: Source pump starting.
2022-09-20T08:26:20.213977Z INFO vector: Vector has started. debug="false" version="0.25.0" arch="x86_64" build_id="a3de633 2022-09-20"
2022-09-20T08:26:20.213988Z INFO vector::internal_events::api: API server running. address=127.0.0.1:8686 playground=http://127.0.0.1:8686/playground
2022-09-20T08:26:20.213972Z INFO source{component_kind="source" component_id=otel component_type=opentelemetry component_name=otel}: vector::sources::util::grpc: Building gRPC server. address=0.0.0.0:4317
2022-09-20T08:26:20.214067Z DEBUG source{component_kind="source" component_id=internal_metrics component_type=internal_metrics component_name=internal_metrics}: vector::topology::builder: Source pump starting.
2022-09-20T08:26:20.214095Z INFO source{component_kind="source" component_id=otel component_type=opentelemetry component_name=otel}: vector::sources::opentelemetry::http: Building HTTP server. address=0.0.0.0:4318
2022-09-20T08:26:20.214863Z DEBUG sink{component_kind="sink" component_id=metrics component_type=prometheus_exporter component_name=metrics}: vector::utilization: utilization=0.04758076766565722
2022-09-20T08:26:20.214876Z DEBUG sink{component_kind="sink" component_id=console component_type=console component_name=console}: vector::utilization: utilization=0.0731003264962824
2022-09-20T08:26:25.215151Z DEBUG sink{component_kind="sink" component_id=metrics component_type=prometheus_exporter component_name=metrics}: vector::utilization: utilization=0.004824934719655026
2022-09-20T08:26:25.215155Z DEBUG sink{component_kind="sink" component_id=console component_type=console component_name=console}: vector::utilization: utilization=0.007310052448541947
2022-09-20T08:26:25.270990Z DEBUG h2::codec::framed_write: send frame=Settings { flags: (0x0), initial_window_size: 1048576, max_frame_size: 16384, max_header_list_size: 16777216 }
2022-09-20T08:26:25.271161Z DEBUG Connection{peer=Server}: h2::codec::framed_read: received frame=Settings { flags: (0x0) }
2022-09-20T08:26:25.271173Z DEBUG Connection{peer=Server}: h2::codec::framed_write: send frame=Settings { flags: (0x1: ACK) }
2022-09-20T08:26:25.271184Z DEBUG Connection{peer=Server}: h2::codec::framed_read: received frame=Settings { flags: (0x1: ACK) }
2022-09-20T08:26:25.271193Z DEBUG Connection{peer=Server}: h2::proto::settings: received settings ACK; applying Settings { flags: (0x0), initial_window_size: 1048576, max_frame_size: 16384, max_header_list_size: 16777216 }
2022-09-20T08:26:25.271213Z DEBUG Connection{peer=Server}: h2::codec::framed_write: send frame=WindowUpdate { stream_id: StreamId(0), size_increment: 983041 }
2022-09-20T08:26:28.348123Z DEBUG h2::codec::framed_write: send frame=Settings { flags: (0x0), initial_window_size: 1048576, max_frame_size: 16384, max_header_list_size: 16777216 }
2022-09-20T08:26:28.348218Z DEBUG Connection{peer=Server}: h2::codec::framed_read: received frame=Settings { flags: (0x0), header_table_size: 4096, enable_push: 0, max_concurrent_streams: 2147483647, initial_window_size: 268435456, enable_connect_protocol: 0 }
2022-09-20T08:26:28.348227Z DEBUG Connection{peer=Server}: h2::codec::framed_write: send frame=Settings { flags: (0x1: ACK) }
2022-09-20T08:26:28.348233Z DEBUG Connection{peer=Server}: h2::codec::framed_read: received frame=WindowUpdate { stream_id: StreamId(0), size_increment: 268369921 }
2022-09-20T08:26:28.348278Z DEBUG Connection{peer=Server}: h2::codec::framed_read: received frame=Headers { stream_id: StreamId(1), flags: (0x4: END_HEADERS) }
2022-09-20T08:26:28.348300Z DEBUG Connection{peer=Server}: h2::codec::framed_read: received frame=Data { stream_id: StreamId(1) }
2022-09-20T08:26:28.348308Z DEBUG Connection{peer=Server}: h2::codec::framed_read: received frame=Settings { flags: (0x1: ACK) }
2022-09-20T08:26:28.348313Z DEBUG Connection{peer=Server}: h2::proto::settings: received settings ACK; applying Settings { flags: (0x0), initial_window_size: 1048576, max_frame_size: 16384, max_header_list_size: 16777216 }
2022-09-20T08:26:28.348319Z DEBUG Connection{peer=Server}: h2::codec::framed_write: send frame=WindowUpdate { stream_id: StreamId(0), size_increment: 983041 }
2022-09-20T08:26:30.215086Z DEBUG sink{component_kind="sink" component_id=metrics component_type=prometheus_exporter component_name=metrics}: vector::utilization: utilization=0.0005209354885902537
2022-09-20T08:26:30.215169Z DEBUG sink{component_kind="sink" component_id=console component_type=console component_name=console}: vector::utilization: utilization=0.0007310234247616739
2022-09-20T08:26:30.336839Z DEBUG Connection{peer=Server}: h2::codec::framed_read: received frame=Data { stream_id: StreamId(1) }
2022-09-20T08:26:33.773745Z DEBUG Connection{peer=Server}: h2::codec::framed_read: received frame=Data { stream_id: StreamId(1) }
2022-09-20T08:26:35.215909Z DEBUG sink{component_kind="sink" component_id=metrics component_type=prometheus_exporter component_name=metrics}: vector::utilization: utilization=0.0001327414976254911
2022-09-20T08:26:35.215980Z DEBUG sink{component_kind="sink" component_id=console component_type=console component_name=console}: vector::utilization: utilization=0.00007311853987349076
2022-09-20T08:26:35.391339Z DEBUG Connection{peer=Server}: h2::codec::framed_read: received frame=Data { stream_id: StreamId(1) }
2022-09-20T08:26:36.846455Z DEBUG Connection{peer=Server}: h2::codec::framed_read: received frame=Data { stream_id: StreamId(1) }
>> sent SIGINT to vector
2022-09-20T08:26:38.467370Z INFO vector: Vector has stopped.
2022-09-20T08:26:38.467464Z DEBUG source{component_kind="source" component_id=internal_metrics component_type=internal_metrics component_name=internal_metrics}: vector::topology::builder: Finished.
2022-09-20T08:26:38.467494Z DEBUG source{component_kind="source" component_id=internal_metrics component_type=internal_metrics component_name=internal_metrics}: vector::topology::builder: Source pump finished.
2022-09-20T08:26:38.467534Z DEBUG source{component_kind="source" component_id=otel component_type=opentelemetry component_name=otel}: hyper::server::shutdown: signal received, starting graceful shutdown
2022-09-20T08:26:38.467543Z DEBUG sink{component_kind="sink" component_id=metrics component_type=prometheus_exporter component_name=metrics}: vector::topology::builder: Finished.
2022-09-20T08:26:38.467582Z DEBUG sink{component_kind="sink" component_id=metrics component_type=prometheus_exporter component_name=metrics}: hyper::server::shutdown: signal received, starting graceful shutdown
2022-09-20T08:26:38.467593Z DEBUG Connection{peer=Server}: h2::codec::framed_write: send frame=GoAway { error_code: NO_ERROR, last_stream_id: StreamId(2147483647) }
2022-09-20T08:26:38.467609Z DEBUG source{component_kind="source" component_id=otel component_type=opentelemetry component_name=otel}:Server::serve_incoming_with_graceful_shutdown: hyper::server::shutdown: signal received, starting graceful shutdown
2022-09-20T08:26:38.467613Z DEBUG Connection{peer=Server}: h2::codec::framed_write: send frame=Ping { ack: false, payload: [11, 123, 162, 240, 139, 155, 254, 84] }
2022-09-20T08:26:38.467645Z DEBUG Connection{peer=Server}: h2::codec::framed_write: send frame=GoAway { error_code: NO_ERROR, last_stream_id: StreamId(2147483647) }
2022-09-20T08:26:38.467659Z DEBUG Connection{peer=Server}: h2::codec::framed_write: send frame=Ping { ack: false, payload: [11, 123, 162, 240, 139, 155, 254, 84] }
2022-09-20T08:26:38.467875Z DEBUG Connection{peer=Server}: h2::codec::framed_read: received frame=Ping { ack: true, payload: [11, 123, 162, 240, 139, 155, 254, 84] }
2022-09-20T08:26:38.467908Z DEBUG Connection{peer=Server}: h2::codec::framed_write: send frame=GoAway { error_code: NO_ERROR, last_stream_id: StreamId(1) }
2022-09-20T08:26:38.469161Z INFO vector::topology::running: Shutting down... Waiting on running components. remaining_components="otel, console" time_remaining="59 seconds left"
2022-09-20T08:26:40.215146Z DEBUG sink{component_kind="sink" component_id=console component_type=console component_name=console}: vector::utilization: utilization=0.000007328056690628141
2022-09-20T08:26:43.469175Z INFO vector::topology::running: Shutting down... Waiting on running components. remaining_components="otel, console" time_remaining="54 seconds left"
2022-09-20T08:26:45.215520Z DEBUG sink{component_kind="sink" component_id=console component_type=console component_name=console}: vector::utilization: utilization=0.0000007473845769437384
2022-09-20T08:26:48.469190Z INFO vector::topology::running: Shutting down... Waiting on running components. remaining_components="otel, console" time_remaining="49 seconds left"
2022-09-20T08:26:50.215532Z DEBUG sink{component_kind="sink" component_id=console component_type=console component_name=console}: vector::utilization: utilization=0.00000009471841318445308
2022-09-20T08:26:53.469667Z INFO vector::topology::running: Shutting down... Waiting on running components. remaining_components="otel, console" time_remaining="44 seconds left"
?2022-09-20T08:26:55.215562Z DEBUG sink{component_kind="sink" component_id=console component_type=console component_name=console}: vector::utilization: utilization=0.000000023871753381202232
2022-09-20T08:26:58.469196Z INFO vector::topology::running: Shutting down... Waiting on running components. remaining_components="otel, console" time_remaining="39 seconds left"
2022-09-20T08:27:00.215064Z DEBUG sink{component_kind="sink" component_id=console component_type=console component_name=console}: vector::utilization: utilization=0.000000018768807089225854
Just checked the gRPC requests with wireshark in both cases. Nothing stands out, except that the otel collector exports gzipped logs and populates a few fields whereas envoy does not.
@char8 Vector is using gzip to ingest otel log https://github.com/vectordotdev/vector/blob/master/src/sources/opentelemetry/mod.rs#L110, but maybe try to make envoy gzip the log?
Another user has run into this as well, and shared the following Envoy config to help us reproduce/troubleshoot.
EDIT: this config has some whitespace errors that need some resolving
!ignore parameters:
- &access_log_address {address: vector, port_value: 8001}
static_resources:
listeners:
- name: my_endpoint
address:
socker_address:
address: 0.0.0.0
port_value: 80
filter_chain:
filter_chain_match:
destination_port: 80
filters:
- name: envoy.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
access_log:
- name: envoy.access_loggers.open_telemetry
typed_config:
"@type": type.googleapis.com/envoy.extensions.access_loggers.open_telemetry.v3.OpenTelemetryAccessLogConfig
common_config:
log_name: https_ingress_logs
grpc_service:
timeout: 5s
envoy_grpc:
cluster_name: access_log
transport_api_version: v3
body:
kvlist_value:
values:
- key: "bytes_read_count"
value:
string_value: "%DOWNSTREAM_WIRE_BYTES_SENT%"
route_config:
name: local_route
virtual_hosts:
- name: my_local_server
require_tls: NONE
domains:
- localhost*
- localhost
routes:
- match:
prefix: "/"
route:
cluster: my_local_http
timeout: 1s
clusters:
- name: access_log
connect_timeout: 5s
per_connection_buffer_limit_bytes: 32768
type: STRICT_DNS
typed_extension_protocol_options:
envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
"@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
common_http_protocol_options:
idle_timeout: 50s
explicit_http_config:
http2_protocol_options: {} # lulzy but this instructs envoy to use http2 for grpc
circuit_breakers: *circuit_breakers
load_assignment:
cluster_name: access_log
endpoints:
- lb_endpoints:
endpoint:
address:
socket_address: *access_log_address
- name: my_local_http
connect_timeout: 5s
type: STRICT_DNS
load_assignment:
cluster_name: my_local_http
endpoints:
- lb_endpoints:
endpoint:
address:
socket_address:
address: localhost
port_value: 80
@char8 Vector is using gzip to ingest otel log https://github.com/vectordotdev/vector/blob/master/src/sources/opentelemetry/mod.rs#L110, but maybe try to make envoy gzip the log?
This shouldn't matter as tonic will check if the request is compressed, using accept_compressed
more or less just means that it can be compressed I believe.
This shouldn't matter as tonic will check if the request is compressed, using accept_compressed more or less just means that it can be compressed I believe.
That is also my understanding - I would be surprised if that was the cause
So after troubleshooting a bit, It seems the request is making it to vector and even makes it so far to be decompressed. When running through https://github.com/vectordotdev/vector/blob/c59d71cd76efa18807f772da4520208b4b5898f4/src/sources/util/grpc/decompression.rs#L223-L229 this code it seems to call send message and then never return (not sure whats on the other end of the destination). From a TCP dump I can see Envoy sending the logs, and Vector acking the request back but then nothing comes of it.
I suspect Tonic doesn't believe the request is finished and is waiting for more, but I have no proof of this (and also don't know where to look).
A note for the community
Problem
We're trying to ingest logs from the Envoy proxy
envoy.access_loggers.open_telemetry
extension into vector with theopentelemetry
source.Configuring envoy to output logs directly to vector shows H2 connections and data frames in debug logs, but no log entries reach the sink and we see no errors. Metrics on the Envoy side show bytes written and no errors.
opentelemetry-collector
can receive these logs using gRPC, and we can re-export them from there tovector
s opentelemetry gRPC source which works fine.Configuration
Version
vector 0.25.0 (x86_64-unknown-linux-gnu d498040 2022-09-18)
Debug Output
Example Data
Currently sending a fixed body
{"foo": "baz"}
with each log event. This is what the event looks like when logged byopentelemetry-collector
.When vector ingests these from
opentelemetry-collector
, the vector event written to the sink looks like:Additional Context
Ideally we'd like to avoid running
opentelemetry-collector
in front of vector. Is there any further steps we can take to debug why vector isn't ingesting logs / not showing any errors when Envoy is directly sending logs to it?To get it to work, this is the
opentelemetry-collector
config we're usingReferences
No response