twisted / twisted

Event-driven networking engine written in Python.
https://twisted.org
Other
5.59k stars 1.17k forks source link

Add PROXY protocol support #8203

Closed twisted-trac closed 8 years ago

twisted-trac commented 8 years ago
kconwayatlassian's avatar kconwayatlassian reported
Trac ID trac#8203
Type enhancement
Created 2016-02-12 20:52:50Z
Branch https://github.com/twisted/twisted/tree/haproxy-endpoint-8203-2

HAProxy annotates the beginning of every TCP stream with connection data. The header format and expected service behaviour are defined as the PROXY protocol (http://www.haproxy.org/download/1.5/doc/proxy-protocol.txt).

Add support for handling the PROXY protocol in all Protocol implementations.

Attachments:

Searchable metadata ``` trac-id__8203 8203 type__enhancement enhancement reporter__kconwayatlassian kconwayatlassian priority__normal normal milestone__None None branch__branches_haproxy_endpoint_8203_2 branches/haproxy-endpoint-8203-2 branch_author__adiroiban adiroiban status__closed closed resolution__fixed fixed component__core core keywords__None None time__1455310370390963 1455310370390963 changetime__1459991483592152 1459991483592152 version__None None owner__glyph glyph ```
twisted-trac commented 8 years ago
glyph's avatar @glyph set owner to kconwayatlassian

In combination with the architectural discussion we had on IRC (in short: this should be a wrapper endpoint like wrapClientTLS) and the comments adiroiban has left here I think we have a full review, so I'll remove the keyword for now.

twisted-trac commented 8 years ago
adiroiban's avatar @adiroiban removed owner

Thanks for the updated patch.


I don't have a good understand of how endpoints should be designed and also I have never worked with HAProxy or any other proxy, so I am doing just a high level review here.

Hope it helps.

Feel free to ignore it if it does not make sense.


This patch adds a nice functionality for endpoints, but from my point of view, the endpoint functionality is poorly advertised.

I think that is is very important to include a few notes about the new functionality in the narrative documentation dedicated to endpoints:

https://twistedmatrix.com/documents/current/core/howto/endpoints.html#endpoint-types-included-with-twisted

An example would also be great :)

I would say that the release note fragment should also advertise that HAProxy is available as an endpoint.


For my comments from the previous review, I would prefer if those questions were answered with: see documentation / see apidoc :)

I tried to make those comments from the point of view of a normal/simple Twisted user and I think that they are important questions which should be answered in the documentation / apidocs.

We should not expect each Twisted user to read each comment from each review request :)


I have sent your patch to our buildbot.

New patches should pass on all current supported builders.

You can check the results here https://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/haproxy-endpoint-8203

You can check the generated API docs here https://buildbot.twistedmatrix.com/builds/apidocs/apidocs-7117ae33aa236f2217e9d6acf9ed0cca5fa8c39c.tar.bz2

You can also check the coverage - https://buildbot.twistedmatrix.com/builds/twisted-coverage.py/twisted-fedora23py2.7-coverage.py-r7117ae33aa236f2217e9d6acf9ed0cca5fa8c39c/


I am not sure if new patches should be accepted if they break Py3. I would say that all new patches should also consider working on py3... unless they depend on something which is not yet ported.


I am not sure what is the coverage requirement for new patches, but I would say that new patches should have 100% coverage. I am not saying that 100% coverage will solve all problems, but I think that 100% is a good start.


New code should have 100% apidocs. Even for private methods.

See some missing methods: https://buildbot.twistedmatrix.com/builders/twistedchecker/builds/3853/steps/run-twistedchecker/logs/new%20twistedchecker%20errors


I would say that the new code can have better documentation :)

For example in HAProxyProtocol what is the purpose of self.info ? The name is very generic. It is very important to describe the purpose of that variable so that a future refactoring would know how to handle it and what was the initial design.

Based on current usage it reads like a flag, so a better name would be something like 'wasProxyHeaderParsed` ... if the proxy header is imporant, I think that holding it in a varialbe named 'haProxyHeader' is better than this generic 'info'.

I can only see it referenced as self.info. Why do we need it as a public attribute?


Do we need to export the parser as public objects? Why do we need the new public interfaces? How are they supposed to be used by Twisted users?


I am leaving this for review as I only did a superficial review.

I hope that having a branch for the patch and having the test executed for this branch helps :)

Thanks again for your contribution.

twisted-trac commented 8 years ago
glyph's avatar @glyph set owner to kconwayatlassian

I wanted to put this change up early to get some arch feedback from Glyph (or others who are opinionated about Endpoint). He strongly suggested this needed to be an Endpoint instead of the mixin that existed in a previous iteration of the patchset.

Thanks for doing this. If you need specific review feedback from me, please feel free to send me an email to me, or to the list, to point me at a ticket, or I'll stochastically get around to it at some point (as illustrated here). Generally though, any reviewer should feel empowered to make changes, and not everything needs my personal sign-off.

Mostly I'm looking to get approval/disapproval for

OK, let me go through these one at a time:

the _WrapperServerEndpoint addition that is modelled after _WrapperEndpoint

This looks great. We may want to refactor these a little internally first, but they should probably both be exposed publicly after some re-naming.

the syntax of adding via compounding ports as haproxy:tcp:80

and the package organisation as a series of private modules exposed via the package init which is modelled after the t.logger.

💯

Given the way that Python accidentally re-exports every single !@#% symbol that is imported into your code, this is absolutely the way that I would like to see all Twisted code written going forward. Thank you for doing this.

However! I think we should be doing even more hiding of implementation. Every exposed bit of API surface is potential maintenance cost in the future, and we don't need a lot of it for the desired feature here. There's no particular reason I can think of to expose internal implementation details like V1Parser or V2Parser, or even the IProxyParser interfaces; the information that applications need to retrieve should be coming back from existing transport interfaces like getPeer. Honestly the only thing that it seems to me should be exported from the entire package is the endpoint, and possibly some exceptions if there is any case (which we'll only really know from the narrative docs) where an application needs to catch it.

If those things are agreed on I can quickly follow up with docs and additional tests before going for a merge.

I'm going to hold you to "quickly" :).

In the future, though, I'd recommend not blocking on a review like this, especially on the narrative docs. I've found that many of the design issues spotted by review are things that I can find myself if I just start working on the docs, because the docs sound stilted; not to mention that the final reviewer will probably start by looking at the docs, which may uncover new issues, so if there's an issue there you'll have to re-submit and do a bunch of work anyway :-).

I have a few other minor administrative concerns, which I apologize if I'm duplicating from Adi's review, but I didn't see them there as I skimmed:

  1. Please don't re-type string literals; AF_UNSPEC, LOCAL, et. al. should be constants (possibly just imported from socket?)
  2. Since this is new code, it should pass tests on python 3.
  3. In support of point 2, please don't use zope.interface.implements; it is deprecated. Instead, use the zope.interface.implementer class decorator.
  4. While you're at it, the right import style for Twisted would be from zope.interface import implementer; @implementer ... and not @zope.interface.implementer.

Generally this looks like great quality code and I'm enthusiastically anticipating its integration. Thank you for submitting it.

Please address adi's comments and mine to your satisfaction and re-submit for review.

twisted-trac commented 8 years ago
glyph's avatar @glyph set owner to @glyph

Thanks for resubmitting. A few points of light feedback:

  1. There are lots of coding-standard-compliance issues. The buildbot spotted lots of issues (although there is some noise right now due to bugs revealed by the recent migration to 'tox'). However, in thise case, since it's mostly new code, you can quash a lot of the noise by ensuring that twistedchecker twisted/protocols/haproxy emits no results.
  2. I mentioned that the API should be just the endpoint; and I really meant that, just the endpoint. Not the factory and the protocol; there's no value to exposing those to applications, and there is compatibility surface to maintain. Something I didn't bring up in a previous review is that we should avoid exposing types when we can, because if you expose a type publicly, inheritance becomes a potential compatibility surface for that type. In The Talk, one of the patterns they mention is exposing only a top-level function which creates a (private) concrete implementation of a (public) interface. We should do that here. Further, I think that twisted.internet.endpoints should not really contain any of this code; if we need it there to integrate with the string parser, we should have a local import in the parse function to avoid 7 additional (infrequently used) additional modules when loading the (almost always used) twisted.internet.endpoints.
  3. The Windows builders have a problem with the twisted.protocols.haproxy._exc module. I think this might just be a conflict with a Windows-specific module named "_exc"; there doesn't look like there's any hairy platform-specific code in there. Whatever it is, it needs to be addressed before landing though.
  4. The Python 3 changes seem to work, but are not properly tracked in dist3.py, so some of the tox builders fail.

If you were a Twisted committer, I'd say "address these to your satisfaction and merge", so, the rule is, I will address them to my satisfaction and then merge. In terms of change size the difference will probably be pretty big, but most of it will be futzing around with whitespace for coding-standard and doc-standard compliance. (If I end up having to make more substantial changes, I'll resubmit.)

Thanks again for your contribution.

twisted-trac commented 8 years ago
glyph's avatar @glyph set status to closed

(In [47179]) Merge haproxy-endpoint-8203-2: PROXY endpoint.

Author: kconwayatlassian

Reviewer: glyph

Fixes: #8203

twisted.protocols.haproxy.proxyEndpoint provides an endpoint that wraps any other stream server endpoint with the PROXY protocol that retains information about the original client connection handled by the proxy; this wrapper is also exposed via the string description prefix 'haproxy'; for example 'twistd web --port haproxy:tcp:8765'.

twisted-trac commented 8 years ago
adiroiban's avatar @adiroiban commented

Hi and thanks for your contribution.

I am doing just a high level review here.


Since HAProxyProtocolMixin has an init method I would argue that it is not a mixin, but a full featured class.

Why do we need it as a mixin?

The current usage looks prone to errors.. as you might forget to call

Can we have it as a protocol helper?

Maybe like this:


        from twisted.words.protocols.jabber import xmlstream
        class XmlStream(xmlstream.XmlStream):

            def __init__(self, *args, **kwargs):
                xmlstream.XmlStream.__init__(self, *args, **kwargs)
                self._haProxy = HAProxyHelper()

            def dataReceived(self, data):
                if self._haProxy.proxyParsed:
                    return xmlstream.XmlStream.dataReceived(self, data)

                self._haProxy.dataReceived(data)

                # If the header was parsed, continue with the remaining bytes.
                if self._haProxy.proxyParsed and self._haProxy.overflowBuffer:
                    xmlstream.XmlStream.dataReceived(self, self.overflowBuffer)

I think that a protocol wrapper would be best so we will end up something like this.

I am just using lazr.delegates for simplicity

Maybe my example will not work... I am just giving some possible unproven solutions here.

Also, I don't know much about how you would want to use the parsed header in real life... so my suggestion could be completely wrong.

HAProxyProtocolWrapper can we very similar the current implementation, but it will have a state machine which will call the wrapped protocol once the haproxy headers were parsed.


@delegate_to(IProtocolFactory, context='_wrapperdFactory')
class HAProxyWrapperFactory(Factory):
    IProtocolFactory
    def __init__(self, wrappedFactory):
        self._wrapperdFactory = wrappedFactory

    def buildProtocol(self, addr):
        wrappedProtocol = self._wrapperdFactory.buildProtocol(addr)
        protocol = HAProxyProtocolWrapper(wrappedProtocol)
        return protocol

Does this code support both version 1 and version 2 of the PROXY protocol ?

I think that the docstring should describe the supported protocol version and provide a link to the protocol specification.

From the code I can only see version 1 handling.


The HAProxy protocol is a bit strange. How do you advertise the supported versions? I did not take the time to read the full linked documentation. It looks like version 1 is text mode while version 2 has both text and binary mode.

What would happen with this code if HAProxy add version 3?... I guess that it will just raise a generic InvalidProxyHeader ...


I think that the test coverage could be improved. For example I don't see any tests for the new public InvalidProxyHeader exception.


What is the value of having this in the core Twisted package?

Why not have it as a txhaproxy independent package?

Just asking :)


I am leaving this for review as other can send their feedback.

Many thanks for your contribution.

twisted-trac commented 8 years ago
kconwayatlassian's avatar kconwayatlassian commented

Stepping through the review from adiroiban and glyph:

Why do we need it as a mixin?

The implementation of this as a mixin class was something that came up very quickly in IRC while talking to glyph. After talking about it for a bit it was clear that the mixin was not the right solution and the Endpoint interfaces should be leveraged instead. The latest patch removes the mixin business and works as an Endpoint/Factory/Protocol wrapper.

Does this code support both version 1 and version 2 of the PROXY protocol ?

The original patch only implemented version 1. This latest patch also supports version 2. Version 2 is a bit more complex since it is a binary format and contains a richer data set. I had originally intended to follow up with a second patch to add support for it, but I found the time to add it now.

The HAProxy protocol is a bit strange. How do you advertise the supported versions?

The protocol specification uses connection closing as the primary mechanism for advertising version support. As in, if you don't support the version you close the connection. There are currently only two versions of the protocol and the specification details the rules by which a receiver that implements both determines which parser to use. This part of the spec is implemented in this patch so services would support both versions simultaneously.

What would happen with this code if HAProxy add version 3?... I guess that it will just raise a generic InvalidProxyHeader

Currently any version higher than 2 will result in a closed connection as required by the spec.

I think that the test coverage could be improved.

Take another look if you would. I've added additional tests for the parsers and the protocol.

What is the value of having this in the core Twisted package?

We felt the PROXY protocol was wide-spread enough that Twisted could benefit from having support in core. There's a list of implementations in the PROXY spec and it includes some of the most widely used proxy/load balancing tools. Currently, running any non-HTTP Twisted service behind a load balancer means the true peer and host data are lost. Having this in core means Twisted developers can, without pulling in other deps, run any service behind a load balancer without sacrificing that information.

twisted-trac commented 8 years ago
adiroiban's avatar @adiroiban commented

(In [46857]) Branching to haproxy-endpoint-8203.

twisted-trac commented 8 years ago
kconwayatlassian's avatar kconwayatlassian commented

@adiroiban

I appreciate the branch and the review. I'll take a look at the test coverage gaps and see what's missing. As far as docs go, I completely agree with your feedback. Adding more API and narrative docs is on my todo list before finalising the patchset. I wanted to put this change up early to get some arch feedback from Glyph (or others who are opinionated about Endpoint). He strongly suggested this needed to be an Endpoint instead of the mixin that existed in a previous iteration of the patchset.

Mostly I'm looking to get approval/disapproval for the _WrapperServerEndpoint addition that is modelled after _WrapperEndpoint, the syntax of adding via compounding ports as haproxy:tcp:80, and the package organisation as a series of private modules exposed via the package init which is modelled after the t.logger.

If those things are agreed on I can quickly follow up with docs and additional tests before going for a merge.

twisted-trac commented 8 years ago
adiroiban's avatar @adiroiban commented

I think that the narrative docs will help here, as we can see how this will be handled from an end-user point of view and assert if it make sense and is easy to use.

If it is easy to describe/document in a accurate way, then it might be also be easy to use and might a good design :)

Please consider breaking this patch as it should make it easier for someone to review it... it harder to do after hours review for big diffs

Thanks!

twisted-trac commented 8 years ago
glyph's avatar @glyph commented

Did you intend to put this back into review with the new patch?

twisted-trac commented 8 years ago
glyph's avatar @glyph commented

Kicked off a new build just to be sure, but modulo fixing the topfile to refer to the endpoint and not the protocol, this should be ready to land.

twisted-trac commented 8 years ago
adiroiban's avatar @adiroiban commented

Thanks for the patch.

Latest patch applied and sent to buildbot.

The narrative docs look ok. I would say that maybe it would help to be explicit and inform that both protocol v1 and v2 are supported.

Leaving it for review as this is a big patch and I don't have time to review it.

twisted-trac commented 8 years ago
glyph's avatar @glyph commented

The Windows issue is simply that AF_UNIX doesn't exist on Windows. I feel kind of silly for not seeing it.

I'll address it by just removing the NETFAMILIES and NETPROTOCOLS dictionaries; the lookup is unnecessary, because "48" is already a protocol-defined integer, it just needs a human-readable name. (I might use ValueConstant but since this is not a public API it shouldn't matter much.)

twisted-trac commented 8 years ago
adiroiban's avatar @adiroiban commented

(In [47131]) Branching to haproxy-endpoint-8203-2.

twisted-trac commented 8 years ago
glyph's avatar @glyph commented

One final thing: I'm going to move some of the class-level attributes in _wrapper out, in order to avoid exposing them publicly by accident; since policies.ProtocolWrapper shows up as self.transport to application code, it's important not to put any more public attributes on it.