spotify / ffwd

a flexible metric forwarding agent
https://spotify.github.io/ffwd/
Apache License 2.0
79 stars 33 forks source link

[WIP]Add a New Metric Data Model and Update SerializerProto100 #215

Closed ao2017 closed 3 years ago

ao2017 commented 3 years ago

Background: Please refer to this issue.

Requirements: FFWD agent supports 2 protocols TCP and UPD. UDP metrics are serialized using Google Proto and TCP metrics are mostly handles through spotify100 serializer. The requirements for distribution support is to maintain the current core functionality.

  1. Ensure FFWD agent new version can handle both the new and old metrics.
  2. Ensure that metrics from new and old version of FFW agent can be handled upstream using the same pubsub topic.

What's New

  1. New JSON metric data model
  2. Update protoSerializer100
  3. Update various interface to support the new metric format.
ao2017 commented 3 years ago

I think we should not introduce new objects/versions if we could avoid that. Current ways to receive metrics in FFWD:

  1. UDP/Proto - adding ability to decode the message v1 or v0 should be sufficient. And internally using same Metric and Batch with Value instead of double should work.
  2. TCP/JSON - same here but with JSON. Internally using same Metric and Batch with Value instead of double should work.

Sending to PubSub with updated spotify_100.proto to support new field Value only should work. I also believe that Heroic consumers should be updated first (after rigorous testing) with this new schema so any new messages published to PubSub will be handled correctly.

In regards of the cadence I think this change shouldn't be deployed first - we should have ability to upgrade FFWD and with this change merged we won't be able to do so until we test all pipeline components. Let's come up with plan how we could test these changes without them merged into master. We should be 100% sure these changes are not adding anything funky to metrics transportation.

@malish8632 as discussed offline in last week meeting and yesterday morning, we need to introduce second version of the metrics to ensure compatibility. This is a common practice if you have an API. We will support both new and old metrics. As I stated during our monthly meeting on distribution support in heroic, it is an opt-in feature. Heroic users don't have to update any code unless they want to use distribution.

Regarding deployment, my goal is to introduce this feature gradually because of its scope. If anyone try to send a new version of the metric FFWD agent will drop it for now. We will start sending distribution metric to heroic when heroic will be able to handle it.

For testing, we will use well known best practices: good unit test coverage and integration testing in staging env. So far, I added distribution to 2 projects. In both cases, I improved the code coverage. We are going to use the same approach for FFWD.

I hope that this change request is settled.