Open ryanartecona opened 1 month ago
Thanks for this detailed feature request @ryanartecona !
Given you say that you'd be happy if Vector treated the incoming data as opaque, I'm wondering what you plan to use Vector to do with the data? Are you intending to just "proxy" the requests?
On the source side—
- a
decode_multipart_form_data()
VRL function would be hugely helpful. It's not a hard blocker, as I was able to roll my own crude parser in VRL, but I'd love to be able to delete that code and use something built into VRL.
This seems like a reasonable addition. I could also see enhancing the http_server
source to be able to handle multi-part data as a first-class concept (though I'm not sure exactly what this would look like).
- a specific
type: pyroscope_grpc
source might have been nice, but not a huge deal as atype: http
source with a protobuf encoding seems to work
Agreed. I could see it being useful to add for discoverability, but it seemingly could be a simple wrapper around the http_server
source.
On the sink side—
- Adding a
type: grpc
sink would be ideal. If Vector had a generic gRPC sink, I could use that for both source types and just restructure the payloads to fit the schema.
Agreed. I'm not sure if it is possible to create a dynamic gRPC sink in Rust though. The existing sinks that use gRPC use code generation. It seems like something should be doable using prost_reflect
though.
- If a gRPC sink can't be added or would take longer, supporting dynamic
uri:
field and/orquery_parameters:
field with dynamic values in the HTTP sink would suffice.
Agreed, these would be useful in their own right. Related issues:
Given you say that you'd be happy if Vector treated the incoming data as opaque, I'm wondering what you plan to use Vector to do with the data? Are you intending to just "proxy" the requests?
Mostly yes. We also have Vector doing some extra things like tag insertion which are convenient to also do in VRL for these profiles.
I could also see enhancing the
http_server
source to be able to handle multi-part data as a first-class concept (though I'm not sure exactly what this would look like).
Even better! I like it.
Agreed. I'm not sure if it is possible to create a dynamic gRPC sink in Rust though. The existing sinks that use gRPC use code generation. It seems like something should be doable using
prost_reflect
though.
Ohh, that's unfortunate. I was hoping it would be an easier addition from existing pieces, since I knew the vector source/sink components existed, but I forgot about the gRPC codegen part.
Thanks for linking those other issues. I had seen #201 but not #6759. Upvoted.
Should I file other issues for any of those specific pieces?
Should I file other issues for any of those specific pieces?
I think it'd be reasonable to open separate issues for:
http_server
sourcegrpc
sink
A note for the community
Use Cases
I'm trying to get Pyroscope data flowing through Vector (well, a vector-to-vector pair of a Vector agent in a source cluster to an "aggregator" in the destination cluster). Pyroscope supports 2 methods of ingest from its language-specific SDKs, an HTTP POST API which supports
multipart/form-data
uploads, and a gRPC Push service.I have a client of each type—
In this case I don't need Vector to have an internal data model for profiles. I'd be happy if they were treated as Log events, with the contents being mostly an opaque binary payload (a gzipped
pprof
message, which is itself a protobuf) and a set of label names/values with a certain structure.Attempted Solutions
With some creative config, I could get both a gRPC source and an HTTP multipart upload source working. I was unable to get either a gRPC sink or an HTTP upload sink working, though, which is what I'm blocked the hardest on.
gRPC source
Somewhat surprisingly, I was able to get Vector to receive the gRPC Push messages by using a
type: http
source, like below.Details
Using a proto desc file from running this in the pyroscope repo:
protoc -Iapi -o pyroscope_push_v1.desc api/push/v1/push.proto --include_imports
HTTP multipart source
I struggled to get this working, but I was eventually able to with some hacks. I could use a separate
type: http
source (below) which captures theContent-Type
header containing the multipart boundary token (i.e.Content-Type: multipart/form-data; boundary=---------abcd1234
). I could then write some hacky VRL which does some crude multipart upload parsing and pulls out the binary profile payload (a gzipped protobuf). The main friction is that some of the string manipulation methods in VRL, namelysplit()
, will force a lossy utf8 encoding under the hood, which corrupts the gzip payload. The workaround makes the multipart upload parser even cruder, but it's at least possible by usingfind()
andslice()
instead of split.Details
gRPC sink
I couldn't get a gRPC sink working at all. I can successfully re-encode a gRPC Push message using
encode_proto()
, but atype: http
sink uses HTTP/1.1 and the Pyroscope gRPC server doesn't accept it.HTTP upload sink
The Pyroscope HTTP Ingest API will accept either a
multipart/form-data
upload, like the nodejs SDK sends, or just a simple POST with the pprof profile as the request body. However, in both cases, it expects metadata including service name and labels in the form of URL query params, which means those have to be dynamically generated per Log event from Vector's perspective. Vector currently doesn't support dynamic values in theuri:
field of the HTTP sink, and there's no way to specify query params separately (likeheaders:
).Proposal
On the source side—
decode_multipart_form_data()
VRL function would be hugely helpful. It's not a hard blocker, as I was able to roll my own crude parser in VRL, but I'd love to be able to delete that code and use something built into VRL.type: pyroscope_grpc
source might have been nice, but not a huge deal as atype: http
source with a protobuf encoding seems to workOn the sink side—
type: grpc
sink would be ideal. If Vector had a generic gRPC sink, I could use that for both source types and just restructure the payloads to fit the schema.uri:
field and/orquery_parameters:
field with dynamic values in the HTTP sink would suffice.References
No response
Version
0.40.0