Open jszwedko opened 2 years ago
I think it was something like https://stackoverflow.com/questions/60044046/how-to-setup-grpc-ingress-on-gke-w-nginx-ingress which I'm also running into while testing against nginx-ingress
The error I'm getting on the agent vector instance while trying to send data to an aggregator which is behind AWS ALB is
"http2 error: connection error detected: frame with invalid size"
.
The only relevant result I got with this error message is: https://github.com/hyperium/hyper/issues/1574
Was running into same problem with nginx-ingress
. Trying to do TLS termination on ingress.
Like the post @spencergilbert linked to it has a problem with the initial "PRI * HTTP/2.0" package. Nginx just returns 400. Strangely enough nginx also says schema for this package is http
and not https
, even if tls is enables.
After lots of testing I did manage to get it to work based on the post linked by @gbregar (thanks!)
Setting the connection to use only h2 protocol in sinks\vector\v2\config.rs
-> new_client:
https.set_callback(move |c, _uri| {
if let Some(settings) = &settings {
c.set_alpn_protos(b"\x02h2")?;
settings.apply_connect_configuration(c);
}
Ok(())
});
it works! now, dont know if this is the right fix for this.. but at least fixed problem with nginx ingress
Interesting, thank you for that note @tesharp ! We'll test this and integrate it.
Is there any more progress on this? I'd really appreciate some help in using ALBs for Vector sources. Since I am using version=2
(which is the gRPC
version), I've set the protocol_version
for target groups to be gRPC
. However, I am unsure what should be the correct healthcheck
config here:
Port 6000 is where vector source listens on.
This is what I have and it's failing:
Healthcheck path is incorrect. If you set the backend-protocol to GRPC
then path should be /grpc.health.v1.Health/Check
@gopiio Thanks for the help! The following config worked for healthcheck.
However, I am still unable to connect it for vector sink. Anything that I am missing here?
Verbose logging:
2022-10-15T06:38:36.072903Z DEBUG hyper::client::connect::http: connecting to 172.31.yyy.xxx:443
2022-10-15T06:38:36.112687Z DEBUG hyper::client::connect::http: connected to 172.31.yyy.xxx:443
2022-10-15T06:38:36.199539Z DEBUG h2::client: binding client connection
2022-10-15T06:38:36.199585Z DEBUG h2::client: client connection bound
2022-10-15T06:38:36.200495Z DEBUG h2::codec::framed_write: send frame=Settings { flags: (0x0), enable_push: 0, initial_window_size: 2097152, max_frame_size: 16384 }
2022-10-15T06:38:36.201429Z DEBUG Connection{peer=Client}: h2::codec::framed_write: send frame=WindowUpdate { stream_id: StreamId(0), size_increment: 5177345 }
2022-10-15T06:38:36.201483Z DEBUG hyper::client::pool: pooling idle connection for ("https", alb-address:443)
2022-10-15T06:38:36.202630Z DEBUG Connection{peer=Client}: h2::codec::framed_write: send frame=Headers { stream_id: StreamId(1), flags: (0x4: END_HEADERS) }
2022-10-15T06:38:36.202683Z DEBUG Connection{peer=Client}: h2::codec::framed_write: send frame=Data { stream_id: StreamId(1) }
2022-10-15T06:38:36.202699Z DEBUG Connection{peer=Client}: h2::codec::framed_write: send frame=Data { stream_id: StreamId(1), flags: (0x1: END_STREAM) }
2022-10-15T06:38:36.240701Z DEBUG Connection{peer=Client}: h2::proto::connection: Connection::poll; connection error error=GoAway(b"", FRAME_SIZE_ERROR, Library)
2022-10-15T06:38:36.240739Z DEBUG Connection{peer=Client}: h2::codec::framed_write: send frame=GoAway { error_code: FRAME_SIZE_ERROR, last_stream_id: StreamId(0) }
2022-10-15T06:38:36.240748Z DEBUG Connection{peer=Client}: h2::proto::connection: Connection::poll; connection error error=GoAway(b"", FRAME_SIZE_ERROR, Library)
2022-10-15T06:38:36.240785Z DEBUG hyper::client::client: client connection error: http2 error: connection error detected: frame with invalid size
2022-10-15T06:38:36.241060Z DEBUG hyper::proto::h2::client: connection error: connection error detected: frame with invalid size
2022-10-15T06:38:36.241367Z DEBUG hyper::proto::h2::client: client response error: connection error detected: frame with invalid size
2022-10-15T06:38:36.241426Z ERROR vector::topology::builder: msg="Healthcheck: Failed Reason." error=Vector source unhealthy component_kind="sink" component_type="vector" component_id=vector_out component_name=vector_out
Sink config:
[sinks.vector_out]
type = "vector"
inputs = ["tag_events"]
address = "alb-address:443"
version = "2"
Source config:
[sources.vector_agents]
type = "vector"
address = "0.0.0.0:6000"
version = "2"
When cURL
ing the ALB, I see this:
curl -X POST --verbose -I --http2 https://alb-address
* Trying 172.31.xxx.yyy:443...
* Connected to alb-address (172.31.xxx.yyy) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* CAfile: /etc/ssl/certs/ca-certificates.crt
* CApath: /etc/ssl/certs
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS header, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS header, Certificate Status (22):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS header, Certificate Status (22):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS header, Certificate Status (22):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS header, Finished (20):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS header, Finished (20):
* TLSv1.2 (IN), TLS header, Certificate Status (22):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
* subject: CN=*.domain.tld
* start date: Dec 24 00:00:00 2021 GMT
* expire date: Jan 7 23:59:59 2023 GMT
* subjectAltName: host "alb-address" matched cert's "*.domain.tld"
* issuer: C=GB; ST=Greater Manchester; L=Salford; O=Sectigo Limited; CN=Sectigo RSA Domain Validation Secure Server CA
* SSL certificate verify ok.
* Using HTTP2, server supports multiplexing
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* Using Stream ID: 1 (easy handle 0x5642781d2e80)
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
> POST / HTTP/2
> Host: alb-address
> user-agent: curl/7.81.0
> accept: */*
>
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
< HTTP/2 502
HTTP/2 502
< server: awselb/2.0
server: awselb/2.0
< date: Sat, 15 Oct 2022 07:05:06 GMT
date: Sat, 15 Oct 2022 07:05:06 GMT
< content-type: text/html
content-type: text/html
< content-length: 122
content-length: 122
<
* Excess found: excess = 122 url = / (zero-length body)
* Connection #0 to host alb-address left intact
Hi @gopiio !
I think you need to set: tls.apln_protocols = "h2"
on the vector
sink. Or, at least, one other user had success with that and AWS ALBs. Hopefully this works for you too! Assuming it does, we should add this mention to the documentation.
@jszwedko just tried to glue two Vector topologies with vector
sink and source in-between, and it failed.
Component has following configurations:
tls.apln_protocols = "h2"
tls.enabled = true
And it fails with
{"timestamp":"2022-11-22T15:50:50.975966Z","level":"WARN","message":"Retrying after error.","error":"Request failed: status: Unknown, message: \"http2 error: connection error detected: frame with invalid size\", details: [], metadata: MetadataMap { headers: {} }","internal_log_rate_limit":true,"target":"vector::sinks::util::retries","span":{"component_id":"sink_name","component_kind":"sink","component_name":"sink_name","component_type":"vector","name":"sink"},"spans":[{"component_id":"sink_name","component_kind":"sink","component_name":"sink_name","component_type":"vector","name":"sink"}]}
Do you have any working examples or tips on make this working?
Thank you!
@jszwedko just tried to glue two Vector topologies with
vector
sink and source in-between, and it failed.
- Topolgy 1 -> AWS ALB -> Topology 2
- AWS ALB has HTTPS listener, HTTP with gRPC in Target Group
Component has following configurations:
tls.apln_protocols = "h2" tls.enabled = true
And it fails with
{"timestamp":"2022-11-22T15:50:50.975966Z","level":"WARN","message":"Retrying after error.","error":"Request failed: status: Unknown, message: \"http2 error: connection error detected: frame with invalid size\", details: [], metadata: MetadataMap { headers: {} }","internal_log_rate_limit":true,"target":"vector::sinks::util::retries","span":{"component_id":"sink_name","component_kind":"sink","component_name":"sink_name","component_type":"vector","name":"sink"},"spans":[{"component_id":"sink_name","component_kind":"sink","component_name":"sink_name","component_type":"vector","name":"sink"}]}
Do you have any working examples or tips on make this working?
Thank you!
Hmm, that's odd. Another user had reported it working for them with that configuration. Which version of Vector? You set it on the sink, correct?
Yes, it is configured on the sink, Vector 0.25.1
@jszwedko just tried to glue two Vector topologies with
vector
sink and source in-between, and it failed.
- Topolgy 1 -> AWS ALB -> Topology 2
- AWS ALB has HTTPS listener, HTTP with gRPC in Target Group
Component has following configurations:
tls.apln_protocols = "h2" tls.enabled = true
And it fails with
{"timestamp":"2022-11-22T15:50:50.975966Z","level":"WARN","message":"Retrying after error.","error":"Request failed: status: Unknown, message: \"http2 error: connection error detected: frame with invalid size\", details: [], metadata: MetadataMap { headers: {} }","internal_log_rate_limit":true,"target":"vector::sinks::util::retries","span":{"component_id":"sink_name","component_kind":"sink","component_name":"sink_name","component_type":"vector","name":"sink"},"spans":[{"component_id":"sink_name","component_kind":"sink","component_name":"sink_name","component_type":"vector","name":"sink"}]}
Do you have any working examples or tips on make this working?
Thank you!
Gotcha, I'm not sure why it isn't working for you then unfortunately. Someone will probably have to dig back into this issue. It is in our backlog.
I got some more time to investigate this issue and actually found some rough edges, but successfully launched working setup at the end with AWS stack.
Working configurations file:
[sources.test]
type = "demo_logs"
format = "shuffle"
lines = ["hello, world"]
interval = 2
[sinks.vector]
type = "vector"
tls.enabled = true
tls.alpn_protocols = ["h2"] # Note: should be a list, not string. And do not type in the field name, Vector won't notice it, instead just runs with default value (which is None).
inputs = ["test"]
address = "https://fqdn:443" # Note: do not forget https! It is important. Otherwise the code does some magic stuff and provision default values.
And source:
[sources.vector]
type = "vector"
address = "0.0.0.0:9000"
version = "2"
Application Load Balancer
with HTTPS
listener (do not forget to attach proper certificate)Target Group
with HTTP
protocol (since TLS is offloaded on the ALB) & configured port (9000
in my example). Protocol version
should be gRPC
and health-check path /grpc.health.v1.Health/Check
.Why it did not work before:
I think you need to set:
tls.apln_protocols = "h2"
on the vector sink
- Proper field name is
alpn_protocols
, notapln_protocols
- Proper value is a list, containing
"h2"
- Apparently code does some magic hand waving and applies http for protocol, if it is not specified in
address
field
I think that in general there are 2 issues:
"h2"
as a str
passes for the value in the Vec
fieldI'm not sure if the third one – some non-intuitive default values logic – is a design problem or lack of documentation or is a problem at all.
So vector doesn't fail with typos in configuration keys? (apln_protocols
vs alpn_protocols
).
And also accepts garbage values and invalid types in configuration too.
I can see now why I couldn't get this working when I tried 😅
Oh, and for security reasons, probably better to default to https
and not http
!
So vector doesn't fail with typos in configuration keys? (
apln_protocols
vsalpn_protocols
). And also accepts garbage values and invalid types in configuration too.I can see now why I couldn't get this working when I tried 😅
For the keys, unfortunately that's on a per component basis right now, and we've not implemented that consistently - I have marked it as a place to improve the UX.
On the value side things are generally string options, rather than enums - which is another place we can improve if we know there is a known set of possible options.
We also faced issues with getting ALB working with vector, after some trial and error we got it working with our ALB + EKS using AWS ALB Controller. In the version of the vector we're using, the health check endpoint seems to be /vector.Vector/HealthCheck
.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
alb.ingress.kubernetes.io/certificate-arn: <CERT ARN>
alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
alb.ingress.kubernetes.io/scheme: internal
alb.ingress.kubernetes.io/ssl-redirect: "443"
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/backend-protocol-version: "GRPC"
alb.ingress.kubernetes.io/backend-protocol: "HTTP"
alb.ingress.kubernetes.io/healthcheck-path: /vector.Vector/HealthCheck
alb.ingress.kubernetes.io/healthcheck-port: "9000"
alb.ingress.kubernetes.io/success-codes: "0"
name: ingress-test
spec:
ingressClassName: alb
rules:
- host: vector-intake.example.com
http:
paths:
- backend:
service:
name: vector
port:
name: 9000
pathType: ImplementationSpecific
Vector config:
sinks:
vector:
type: vector
address: https://vector-intake.example.com
version: "2"
tls:
enabled: true
alpn_protocols: ["h2"]
sources:
vector_in:
type: vector
address: 0.0.0.0:9000
version: "2"
We tried to get it working with nginx ingress controller but kept getting 464
HTTP error.
Hope this is helpful for anyone.
We should setup a demo in https://github.com/timberio/vector-demos using an AWS ALB. If I remember right, last time we tried this, we ran into a gRPC protocol error that we haven't resolved yet.