graphite-project / carbon

Carbon is one of the components of Graphite, and is responsible for receiving metrics over the network and writing them down to disk using a storage backend.
http://graphite.readthedocs.org/
Apache License 2.0
1.5k stars 490 forks source link

Add protobuf protocol support #275

Closed mattrobenolt closed 4 years ago

mattrobenolt commented 10 years ago

This would ultimately replace the pickle transport and favor using protobuf moving forward. Obviously over a long deprecation period. :)

Working on a bunch of projects that send metrics to graphite/carbon, we end up resorting to plain ascii protocol when a binary protocol would be more efficient and lighter.

The problem is that the only supported binary protocol today is pickle. Beyond the fact that pickles are inherently insecure, it's completely difficult to work with in anything outside python. See: https://blog.nelhage.com/2011/03/exploiting-pickle/ and many other references on the web as well as a good talk by @alex at PyCon.

So my proposal is to add in a new protobuf transport that works almost identical to pickle. header+payload style.

The protobuf definition looks like:

message Point {
  required string path = 1;
  required double value = 2;
  required uint32 timestamp = 3;
}

I would be happy to contribute this transport into carbon as well as adding support into the tools that we use, diamond and statsd.

Also worth noting, in my quick test cases, the protobuf encoded message was smaller in terms of bytes than pickle as well.

I'm open to opinions on this. :+1:

SEJeff commented 10 years ago

Please send a pull request :)

I'm quite a big fan in favor of making the transport pluggable.

mattrobenolt commented 10 years ago

Awesome, I'll get something over hopefully this week to start discussing.

trbs commented 10 years ago

@mattrobenolt I would like to offer assistance for creating a pluggable transport. Instead of protobuffers I would like to see something like msgpack and integrate is with https://github.com/trbs/bucky as well.

I use protobuffers regularly but it's kind of a pain to work with (read install) when wanting high performance in Python. (Compiling c++ protobuf python wrappers etc)

To me it seems something like msgpack is both fast and easy to implement in any language.

Fluxx commented 10 years ago

I use protobuffers regularly but it's kind of a pain to work with (read install) when wanting high performance in Python. (Compiling c++ protobuf python wrappers etc)

@trbs Could you elaborate on this? I've used protobuffs many times and never really had any issues installing or compiling the Protobuff interfaces for Python.

To me it seems something like msgpack is both fast and easy to implement in any language.

Messagepack is nice, the main downside being that there isn't any type checking or message structure enforcement - something Protobuffs does quite nicely. Type checking and enforcement can be a pro/con, depending on your use case for message serialization. For use cases like this one, where you're defining a protocol that needs to work cross-langage and cross-component, more rules and structure is better than less, IMO. It reduces the amount of silly errors, increases safely and allows for easier use with strongly typed languages. Thus here, I think Protobuffs is a better choice.

drawks commented 10 years ago

Making the listener bit plugable would be rad. I think we can all argue in circles about different transports and serializations that we would like to see (xdr over udp) but I really don't think it makes any sense to just glom on a bunch of various implementations especially ones that aren't part of the python standard library and incur additional complexity in building/installing which won't provide any benefit to the majority of users who will never flip the bits to enable them.

A typical user can be expected to use the defaults in almost every case; that said IMHO the default inbuilt options should be udp and tcp listeners that support the line based string protocol and a binary protocol that has wide adoption in other languages/platforms and is available in the python standard library. pickle is a strike out on the cross platform/language support but we already have it so it should be supported for at least one more major version to give people time switch over to a replacement. I'd vote for xdr as the default since it is lightweight, ancient(already available in many languages), cross platform and already part of python stdlib.

so... I guess you could tldr this as: :-1: - add protobuff to carbon :+1: - add plugable listeners to carbon :+1: - use protobuff as an example implementation of the new plugable interface :+1: - ditch pickle (eventually) :+1: - adopt xdr as the default binary serialization

trbs commented 10 years ago

@Fluxx I can surely see the value of having type checking and enforced structure. So please don't understand that I am against Protobuffs. Although I do not want to see it mandatory, that goes for other variants like msgpack as well.

I read @mattrobenolt comment as stating that we would like to add a plugable mechanism to graphite for dealing with transports. At that point one could have many different transport options, sanction whatever set of transports are the current hot/best/etc onces and people would be able to (relatively easily) add one if it does not exist in graphite mainline.

If we would want to go that route (writing a plugable transport mechanism) than I would like to help with that where I can.

@drawks Agreed, I forgot about XDR :)

mattrobenolt commented 10 years ago

@drawks I agree that bailing on pickle would be a longer term issue.

But I do think there has to be a strong opinion on choosing some other option that carbon supports. Whether it's protobuf or msgpack or something else, I don't have a super strong opinion. (And that's the problem). Most people don't have a strong opinion on which one, so I feel like it needs to be dictated (for lack of better word) by the server itself.

Having pluggable listeners is great overall. To raise adoption of this formation throughout other services, there has to be a standard that carbon supports. If this were protobuf, we could easily go through the list of major services that talk to carbon and update them accordingly. Like I mentioned, statsd for example.

Graphite could easily publish the accepted point.proto or whatever it's called, then anyone adopting it will just work. Exactly like we do for the ascii protocol today.

My tl;dr is that I feel like a decision needs to be made and a format should be supported by carbon at the same capacity as pickle. But totally having pluggable transports would be awesome for every other use case and flexibility. I would just like to see this get adopted by other software as well instead of everything using ascii since it's the only true cross platform transport. If we don't commit to something in core, ascii will stay as the only true cross platform transport as well.

SEJeff commented 10 years ago

So overall, I think we are pretty much in agreement. My suggestion to you (as a graphite co-maintainer who doesn't spend as much time on the project as he'd actually like), is to do this:

  1. Make the input in carbon transport pluggable
  2. Port the existing pickle code over to the pluggable code
  3. Add protobuf support via the same mechanism as number 2
  4. Enable protobuf inputs on a different port than pickle or the line protocol by default

Of course it depends on the thoughts of the other graphite maintainers, but I would be ok with merging all 4 of those steps assuming the code is reasonable?

cc: @brutasse as he did a really nice pluggable storage backend implementation for graphite-web which has long since been merged.

I also agree that language agnosticism is very important, but don't care if we don't support some obscure language such as ADA or Fortran. I think some of the more "common" serialization libs such as captn proto, protocol buffers, msgpack, etc all have sufficient market penetration that this would be great.

@mattrobenolt Do you want to take a shot at implementing this? Bonus points, but not required if you make a pre-protocol bit that allows us to wrap any of the inputs with TLS / SSL on an optional basis.

drawks commented 10 years ago

I agree with your summation of the problem, I just don't particularly like protobuff and I'm generally opposed to adding more external dependencies, especially ones that require an additional third party tool and mechanism to bootstrap a build.

http://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats#Overview

To me XDR (eXternal Data Representation) has all the green boxes we want including wide language support, maintenance by the IETF, a strong standard with an accepted RFC that hasn't had any changes since 2006. And, IMHO, most importantly doesn't require users to download anything or introduce any additional steps to bootstrap a build.

mattrobenolt commented 10 years ago

@drawks Protobuf does not require anyone to download anything extra to run or compile carbon.

I think you're referring to protoc which just takes the .proto and converts it to a .py. Convention has it that you'd commit the generated .py alongside the source .proto. So nobody else needs to actually do the conversion. It's similar to work with regel if you've done that. You always commit the compiled source so nobody has to deal with it again unless for some reason that struct changes, and in this case... I assume highly unlikely that it will.

SEJeff commented 10 years ago

cc: @graphite-project/committers for thoughts? I think the only thing we are trying to figure out is if we should ship and enable the protobuf support by default.

I support some protobuf using software in production and as a result am comfortable with it. Hilariously enough, I use collectd and collectd proxies for shipping most graphite data over the wan to carbon because collectd's network protocol is binary and much more compact.

Anything or anyone wanting to make the carbon input mechanisms suck less I'm generally +1 on, but I can be convinced otherwise. I've never heard of XDR as it isn't hip and trendy. That being said, I've used the shit out of msgpack as well, which is more compact than XDR, but slightly slower to unmarshal per a few BS benchmarks I wrote up in a few minutes.

TL;DNR: @mattrobenolt do you mind doing steps 1-3 and let us decide on if we want to implement 4, or implement XDR as the default? I know many users will have never heard of XDR who will have heard of protobuf, but that is not a good reason to make one the default over the other. Then we can work on extending the other tools to support a better serialization mechanism into carbon together.

Sound about right @drawks?

obfuscurity commented 10 years ago

My opinion falls largely in line with @drawks and @mattrobenolt. I'd like to see us move towards a messaging protocol that's supported by other languages, but with as few external dependencies as possible (preferably zero). I have no opinion on the specific implementation as long as it meets those criteria and isn't a hassle to maintain.

mattrobenolt commented 10 years ago

@SEJeff I can work on that. I'll split this up into appropriate issues when I dig in later this week.

mattrobenolt commented 10 years ago

@obfuscurity I'm not sure if any protocols that fit those requirements, tbh. :) We could likely vendorize in something to prevent relying on an external dependency though. I'm not sure of a better way except pickle but, well... that's why we're here in the first place.

obfuscurity commented 10 years ago

@mattrobenolt Zero external dependencies is not a requirement, just a guiding principle. :wink:

esc commented 9 years ago

I think pluggable receivers with a sane interface should be the the first priority, the decision (and flame-war :grin: ) on which format to supported can be delayed until later

iksaif commented 7 years ago

Any news on that ? I'd love to see protobuf support. last time I checked pickle was using a good chunk of our cpu...

deniszh commented 7 years ago

Do we have pluggable carbon receivers yet ? IIRC is was a prerequisite...

iksaif commented 7 years ago

We don't, but I was wondering if anybody had been working on any of this.

iksaif commented 7 years ago

Now there is a PR :)

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.