Closed twisted-trac closed 13 years ago
@exarkun removed owner @exarkun set status to new |
---|
Okay. Low-level protocol implementation done (ha ha).
It's all private still. The plan is to implement a higher-level API based on this as soon as this is OKed, then make whichever low-level APIs are suitable for end users public. This is so that we prove that the low-level APIs are good enough for the high-level APIs we want to provide before offering them for public consumption.
@glyph set owner to @exarkun @glyph set status to new |
---|
OK. Awesome. Really glad to see this is finally in review!
Please note that the implementation itself really looks great, I like the idiom of create a request / issue the request / build a response. I think it's a solid low level interface. Most of my concerns are about process, documentation, and naming. So this might look like a lot of text, but I don't think it will actually be a lot of work.
Feedback actually relevant to this branch:
twisted.web._newclient._LenthEnforcingConsumer._length
? Sure you've got enough underscores on that? Every double-private top-level should drop its underscore. (I can understand the desire for private attributes of private classes though, so they can implement an interface and not "leak" anything else into the public API indirectly.)twisted.web.error
.Request
and Response
should be moved to twisted.web.client
.HTTP11ClientProtocol
should be moved to twisted.web.client
. I'm inclined to suggest that it should have a better name, but I don't think that's strictly necessary, given that it is really supposed to be the lower level of something else.twisted.internet.abstract
shouldn't be merged, for obvious reasons._makeFunction
will create functions that don't appear in the API documentation. Also: worst name ever. May I suggest something more along the lines of (spelled python2.3-compatibly, of course):
#!py
@stateful
def connectionLost(self, reason):
"""
The underlying transport went away. If appropriate, notify the parser
object.
"""
I believe this preserves the docstrings?
IStreamingProducer
is a name that seems designed to be confusing. I don't know why IPushProducer
wasn't named IStreamingProducer
in the first place, given that the flag's name to indicate it is "streaming
". Not that I think that is a great name either, but as usual consistency is more important than meaningfulness. A suggestion for this new interface: IEntityProducer
. Maybe even IHTTPEntityBodyProducer
since the length
is kind of an HTTP feature?UNKNOWN_LENGTH
as a distinct value? Wouldn't None
serve this purpose adequately? Or having the attribute be optional? You still have to explicitly check, after all. It seems to bloat the code without providing any real benefits. If we keep it, UNKNOWN_LENGTH
should have a @var in the module docstring.newclient.py
, i.e. Request._connectionLost_CONNECTION_LOST
, appears to be uncovered by any test. Seems like possibly it should just be deleted, since that error condition is wacky enough that getting a "no such method" exception would be okay.Proxy-Connection
should be documented a bit more thoroughly than "undocumented HTTP 1.0 abomination". At least put in a link to someone else who implements it, or explain briefly what they use it for.HTTPParser.isConnectionControlHeader
should just include the comment in its docstring.Request._writeToContentLength
should be broken up to put each bullet point closer to the responsible code, and converted into full sentences, or possibly just removed.Response(None, None, None, None, transport)
a lot. If that's a valid thing to say, perhaps it should be more convenient? Response(transport=transport)
? If it's a test-only thing, a function in the test module would be good (with a docstring explaining why it's OK to leave all those values as None
, since I'm not sure I understand that bit)._blah_BLAH
state-machine methods are documented. Which I actually think I could be OK with, except... none of the BLAH
states are actually documented anywhere either. For now, let's just stick to our existing standard and document all these methods. Notwithstanding the difficulty with constants mentioned below, I'd really like it if TRANSMITTING
, for example, had a docstring of its own.HTTP11ClientProtocol._finish
is also undocumented, and the antecedent in its "XXX" comment is ambiguous. (Thanks to the link for #3420 though.)Idle musings / reminders of persistent problems:
reactor.run()
, I feel it necessary to point out the thing that I pointed out on [http://twistedmatrix.com/trac/ticket/1490#comment:39 my comment to #11551]. Nothing this branch can do, I guess, just a reminder.UNKNOWN_LENGTH
, object()
is an opaque way to make a constant value. I really wish we had some convenient idiom for making constants that would repr()
(and ideally qual()
as well) to their FQPN. Inspecting UNKNOWN_LENGTH
in a debugger, or a failed test, like inspecting any other Python constant, is going to be mysterious and unpleasant. How would you feel about filing a ticket for a good NamedConstant
class? It seems the more protocols the implement, the more constants like this that we have. Anyway, as a more practical solution I suggest just making it a string ("twisted.web.iweb.UNKNOWN_LENGTH") for the time being.pyflakes
warning from any changed file is IUsernameDigestHash
. Not your fault, I know, but could you perhaps just add a blank line that says "IUsernameDigestHash # export in __all__ below
" and take it to zero?vertex.statemachine
and epsilon.modal
before we settle down and decide on one that works?@exarkun set owner to @glyph |
---|
Replying to glyph:
OK. Awesome. Really glad to see this is finally in review!
Please note that the implementation itself really looks great, I like the idiom of create a request / issue the request / build a response. I think it's a solid low level interface. Most of my concerns are about process, documentation, and naming. So this might look like a lot of text, but I don't think it will actually be a lot of work.
Feedback actually relevant to this branch:
- I know that the ticket description says "low level", but I like to think that when a ticket is filed against Twisted for some library functionality, it's implicit that some level of that functionality needs to be public before the ticket is closed. While I like new private implementation guts, I don't like that this change is entirely a new private API. The fact that lots of attributes and classes in here are double-private and triple-private suggests that you've given some thought to what should be exposed and what shouldn't.
I agree with you in general. Some functionality needs to be public. I mentioned the specific motivation behind what is happening in this branch in the comment where I requested a review, though. Can you argue against that plan in particular?
1. I mean, `twisted.web._newclient._LenthEnforcingConsumer._length`? Sure you've got enough underscores on that? Every double-private top-level should drop its underscore. (I can understand the desire for private attributes of private classes though, so they can implement an interface and not "leak" anything else into the public API indirectly.)
You're right. The middle underscore isn't adding anything. I fixed most (not _WrappedException
(yet?) since I may end up moving its definition to another module, depending on what happens for some other points) instances of this pattern in r25999, r26000, and r26001.
1. As a more concrete suggestion, I think the following things should be public (although I may have missed some necessary piece): 1. All the exceptions should be in `twisted.web.error`. 1. `Request` and `Response` should be moved to `twisted.web.client`. 1. `HTTP11ClientProtocol` should be moved to `twisted.web.client`. I'm inclined to suggest that it should have a better name, but I don't think that's strictly necessary, given that it is really supposed to be the lower level of something else. 1. Once these are public the example shouldn't be using any private APIs. Which is good, because we should really never have any examples that use private APIs. They are, almost by definintion, bad examples.
Deferring dealing with any of these points until you respond to my question about the first point.
- The changes in
twisted.internet.abstract
shouldn't be merged, for obvious reasons.
Oh, indeed. Fixed in r26002.
_makeFunction
will create functions that don't appear in the API documentation. Also: worst name ever. May I suggest something more along the lines of (spelled python2.3-compatibly, of course):#!py @stateful def connectionLost(self, reason): """ The underlying transport went away. If appropriate, notify the parser object. """
I believe this preserves the docstrings?
Fixed in r26003.
IStreamingProducer
is a name that seems designed to be confusing. I don't know whyIPushProducer
wasn't namedIStreamingProducer
in the first place, given that the flag's name to indicate it is "streaming
". Not that I think that is a great name either, but as usual consistency is more important than meaningfulness. A suggestion for this new interface:IEntityProducer
. Maybe evenIHTTPEntityBodyProducer
since thelength
is kind of an HTTP feature?
I renamed it to IEntityBodyProducer
in r26004. I think HTTP is already implied by its location (the other interfaces in iweb.py
don't have HTTP in their names).
- Why bother having
UNKNOWN_LENGTH
as a distinct value? Wouldn'tNone
serve this purpose adequately? Or having the attribute be optional? You still have to explicitly check, after all. It seems to bloat the code without providing any real benefits. If we keep it,UNKNOWN_LENGTH
should have a @var in the module docstring.
Added a @var
in r26005. We chose to use UNKNOWN_LENGTH
rather than None
in order to attempt to prevent bugs in applications of the form if response.length: ...
(note that UNKNOWN_LENGTH
is used both as a possible value for Response.length
and for IEntityBodyProducer.length
). A response with length of 0 is of course significantly different from a response with an unknown length.
This rather directly raises a question which I've been pondering all morning though - why is a request body produced with an IEntityBodyProducer
but a response body is made available through the Response.deliverBody
mechanism; it seems as though Response
could have an attribute which is an IEntityBodyProducer
provider. Maybe I'll remember the answer by the time I finish responding to review feedback.^1^
- The last line of
newclient.py
, i.e.Request._connectionLost_CONNECTION_LOST
, appears to be uncovered by any test. Seems like possibly it should just be deleted, since that error condition is wacky enough that getting a "no such method" exception would be okay.
I didn't want to just have an AttributeError
(particularly one exposing an irrelevant implementation detail like how connectionLost
is made to do the right thing depending on what state the protocol is in), but your suggestion made me realize I could do better error handling in makeStateDispatcher
(previously _makeFunction
). So I added that and deleted the untested _connectionLost_CONNECTION_LOST
in r26006.
- This code is a bit more comment-heavy than I'm used to, which I was surprised to discover is pretty good. HTTP sucks a lot, and it's nice to have a guide to some of the more absurd pieces of the implementation inline. However:
Proxy-Connection
should be documented a bit more thoroughly than "undocumented HTTP 1.0 abomination". At least put in a link to someone else who implements it, or explain briefly what they use it for.HTTPParser.isConnectionControlHeader
should just include the comment in its docstring.- The giant enumerated outline in the middle of
Request._writeToContentLength
should be broken up to put each bullet point closer to the responsible code, and converted into full sentences, or possibly just removed.
Fixed these issues in r26007.
- The tests sure do say
Response(None, None, None, None, transport)
a lot. If that's a valid thing to say, perhaps it should be more convenient?Response(transport=transport)
? If it's a test-only thing, a function in the test module would be good (with a docstring explaining why it's OK to leave all those values asNone
, since I'm not sure I understand that bit).
Did something in r26008, let me know if that's sufficient.
- None of the
_blah_BLAH
state-machine methods are documented. Which I actually think I could be OK with, except... none of theBLAH
states are actually documented anywhere either. For now, let's just stick to our existing standard and document all these methods. Notwithstanding the difficulty with constants mentioned below, I'd really like it ifTRANSMITTING
, for example, had a docstring of its own.
I documented the states that Response
and HTTP11ClientProtocol
can be in and I added docstrings for all the state methods in r26009. In doing this, I also noticed that none of HTTP11ClientProtocol
's attributes were documented, so I documented them in r26010.
HTTP11ClientProtocol._finish
is also undocumented, and the antecedent in its "XXX" comment is ambiguous. (Thanks to the link for #3420 though.)
Done in r26011 and r26012.
Idle musings / reminders of persistent problems:
- New examples are great. But every time I see a new example that uses
reactor.run()
, I feel it necessary to point out the thing that I pointed out on [http://twistedmatrix.com/trac/ticket/1490#comment:39 my comment to #11551]. Nothing this branch can do, I guess, just a reminder.
Yea. #3270 is also somewhat related. Do you want to have a fight with Chris about it? react.py is as far as I could push that issue. Hm, actually, I see that Chris didn't voice his objections on the ticket, he must have done it in meatspace, which means I can pretend he didn't, and maybe push the react
solution forward.
- If we're keeping
UNKNOWN_LENGTH
,object()
is an opaque way to make a constant value. I really wish we had some convenient idiom for making constants that wouldrepr()
(and ideallyqual()
as well) to their FQPN. InspectingUNKNOWN_LENGTH
in a debugger, or a failed test, like inspecting any other Python constant, is going to be mysterious and unpleasant. How would you feel about filing a ticket for a goodNamedConstant
class? It seems the more protocols the implement, the more constants like this that we have. Anyway, as a more practical solution I suggest just making it a string ("twisted.web.iweb.UNKNOWN_LENGTH") for the time being.
I don't feel very good about adding this kind of thing to Twisted. Perhaps it will come to that (we probably should have hashed this out long ago, oh well), but it's such a simple thing, and has basically nothing to do with Twisted... Why isn't it in the stdlib, or why isn't there a widespread third-party package that provides the functionality?
Switching to a string (well, I'm going to use unicode actually, ha ha) for now seems like a good idea (r26013).
- The only
pyflakes
warning from any changed file isIUsernameDigestHash
. Not your fault, I know, but could you perhaps just add a blank line that says "IUsernameDigestHash # export in __all__ below
" and take it to zero?
I really dislike hacking around pyflakes shortcomings like that. I'd rather add __all__
support to pyflakes.
- We should really have one good, public implementation of a state-machine helper somewhere. How many more times are we going to re-implement
vertex.statemachine
andepsilon.modal
before we settle down and decide on one that works?
Good question. I'm getting close to the point where I want to write one and put it in Twisted. I'm not quite sure what exact features it'll have yet, though. Some stuff from statemachine
is nice, some stuff from modal
is nice, there's probably useful features that neither of them provides...
I'm re-assigning this to you so you can respond to my response to your point one. After that's dealt with, I wouldn't mind if it went back to the general review pool.
^1^: No, I didn't remember.
@glyph removed owner |
---|
Replying to exarkun:
I agree with you in general. Some functionality needs to be public. I mentioned the specific motivation behind what is happening in this branch in the comment where I requested a review, though. Can you argue against that plan in particular?
So, I have a bunch of reasons why it's a bad idea. I'll list them here, but I've considered it and I think that (aside from removing the example from doc/examples, and putting it somewhere private itself) I am willing to get this merged as all private. If you find yourself persuaded by my arguments, then we can change the plan, but if not, I also think it would be valuable to try experimenting with this one-layer-at-a-time merging strategy. I'm not entirely convinced by my own arguments.
Trunk should always be in a releasable state. So I'm assuming that a release may happen at any time; specifically, after this branch is merged but before the high-level API is merged.
If we release private functionality which is better than the public functionality, (especially if it is replete with examples of how to use the private functionality), it strikes me as likely that users will disregard the underscore and start using it. And who could blame them? When python-core releases a non-functional public API with some juicy private bits that we want to use, we use them anyway and are upset when they break them. (Although granted, it would be nicer if they had a better definition of "private", so we could tell in advance.) I think that having a compatibility policy that works depends on having the kind of credibility with users that comes from not prevaricating too much.
In other words, let's not just replace making too much stuff public with making too much stuff private.
Your motivation for the low-level-then-high-level plan seems to be keeping the branches (and therefore the reviews) smaller, by breaking them into layers. All things being equal, that would certainly be better. But, all things are rarely equal: by splitting the feature up, the nature of the review changes. Large as this branch was, I would personally have preferred it be even larger but included a top-level starting point more polished than the example. When I do reviews of new features I like to start reading at the "top".
My perspective on the structure and necessity of the implementation details is directly influenced by the way in which they are used by the high-level API. For example, in its current form the behavior of HTTP11ClientProtocol.request
when not in the QUIESCENT
state makes sense. But my opinion might be influenced by seeing what the request-queuing implementation implemented above that might look like.
This might also be an argument for keeping the present structure private (rather than making HTTP11ClientProtocol
public) when merging this branch, but it seems to make the shorter reviews a false economy. If I'm going to have to re-review these implementation details again later once I've seen them in more serious use, why bother having a separate branch?
Clearly this branch needed a high-level design to go with it, hence the structure in the example, but the high-level design is only partially included. The example is cheating. It has no tests, it has no documentation, and it's not clear how features that it leaves out (like handling redirects, or translating "error" status codes into errors) will be implemented.
Again, all that said, this is just a suggestion.
I think I might have some time later for a re-review, but I'll put this back into the general pool to avoid blocking it further.
@glyph set owner to @exarkun @glyph set status to new |
---|
Replying to exarkun:
Since this is a pretty long comment, which is turning into a meandering discussion, I'm putting all the mandatory review stuff up here at the front. If you get bored you are free to ignore the rest.
These are all docstring or minor issues, so you can merge once they're dealt with.
twisted.web._newclient.HTTPParser.statusReceived
isn't valid epytext. I believe this is because of the "\r\n"; I've run into this problem before, I'd recommend spelling it "CRLF" in docstrings._newclient
:
IProtocol
IPushProducer
(I think you mean IPushProducer
)IResponse
iweb
:
IConsumer
Deferred
IProducer
Failure
_WrapperException
needs a docstring, whether or not it remains double-private.makeStateDispatcher
should really have a docstring.test_newclient
:
AccumulatingProtocol
(isn't there one of these somewhere that you can use? let's not make #3604 harder to fix.)SlowRequest
SimpleRequest
StringProducer
RequestTests
mumble
". (Perhaps "lol" is not a useful comment, either.)assertResponseFailed
assertRequestGenerationFailed
assertRequestTransmissionFailed
HTTPClientParserTests._noBodyTest
That's it for the real meat of the review. Now for the optional stuff and further rumination on loosely-related ideas:
Replying to glyph:
OK. Awesome. Really glad to see this is finally in review! I agree with you in general. Some functionality needs to be public. I mentioned the specific motivation behind what is happening in this branch in the comment where I requested a review, though. Can you argue against that plan in particular?
The abovementioned comment on IRC was enough to convince me, at least, that the best course of action at this point is to complete the review cycle as quickly as possible, in whatever form.
I still stand by my earlier suggestion, in point 1.2 above, that some elements of this should be public — but this is not out of any procedural concern. I just have a fairly high level of confidence in the API as we've discussed it, and the implementation seems high-quality. The sketch adequately demonstrates to me that this module, or at least the parts that I propose become public, are suitable for a real substrate.
So feel free to make them public, or not.
You're right. The middle underscore isn't adding anything. I fixed most (not
_WrappedException
(yet?) since I may end up moving its definition to another module, depending on what happens for some other points) instances of this pattern in r25999, r26000, and r26001.
This begs an interesting question, which isn't really relevant to this review: if _WrappedException
is private, then there's no documentation nor any supported way to create a ResponseFailed
, RequestTransmissionFailed
, etc. The duplicate documentation of @ivar reasons
is a nice bit of attention to detail, but somewhat redundant.
_makeFunction
will create functions that don't appear in the API documentation. Also: worst name ever. May I suggest something more along the lines of (spelled python2.3-compatibly, of course):Fixed in r26003.
Better. I have some comments for using this more broadly, but these should be considered as comments on a potential public implementation of this; the ad-hoc version here can remain as-is:
def _connectionLost_QUIESCENT
' my impulse was to grep for 'def _connectionLost
', after having seen 'def _bodyDataReceived_INITIAL
' and 'def _bodyDataReceived
'.@foo def bar
' and 'def bar ... bar = foo(bar)
'; adding extra arguments makes it even harder to read.epsilon.modal
addresses all of these issues; the implementations don't need to be private, the names are less ugly, it has a really convenient syntax without being a decorator, etc. It bothers me that it's apparently easier to do something crappy over and over and over again than to roll in code that we've already written from somewhere else.
- Why bother having
UNKNOWN_LENGTH
as a distinct value? Wouldn'tNone
serve this purpose adequately? Or having the attribute be optional? You still have to explicitly check, after all. It seems to bloat the code without providing any real benefits. We chose to useUNKNOWN_LENGTH
rather thanNone
in order to attempt to prevent bugs in applications of the formif response.length: ...
I am not sure what kind of bug you mean. I mean, I can come up with a class of bugs in applications like that, but in order to prevent them you'd need a __nonzero__
method that raises an exception. (Although, actually, that's kind of a cool idea.) As it is, it just flips the polarity from silently failing your test to silently succeeding it. UNKNOWN_LENGTH
can be 0-length too, after all, if all you get is one chunk.
This rather directly raises a question which I've been pondering all morning though - why is a request body produced with an
IEntityBodyProducer
but a response body is made available through theResponse.deliverBody
mechanism; it seems as thoughResponse
could have an attribute which is anIEntityBodyProducer
provider.
Indeed, it would be good if these were consistent. Since you can't remember why it's this way, can you fix it?
- The tests sure do say
Response(None, None, None, None, transport)
a lot.Did something in r26008, let me know if that's sufficient.
Better than sufficient. Reading the tests with that change I feel like I understand this code better :).
Idle musings / reminders of persistent problems:
- New examples are great. But every time I see a new example that uses
reactor.run()
, I feel it necessary to point out the thing that I pointed out on [http://twistedmatrix.com/trac/ticket/1490#comment:39 my comment to #11551]. Nothing this branch can do, I guess, just a reminder.Yea. #3270 is also somewhat related. Do you want to have a fight with Chris about it? react.py is as far as I could push that issue. Hm, actually, I see that Chris didn't voice his objections on the ticket, he must have done it in meatspace, which means I can pretend he didn't, and maybe push the
react
solution forward.
I'd prefer twistd run
to react
for examples, since react
will still produce misleading behavior where newbies think they need to do something special to call listenTCP
twice. I'll probably take your side in a fight against Chris over that though, react
is an improvement. (react()
will probably be more useful for simple command-line tools or client applications, too).
... I really wish we had some convenient idiom for making constants that would
repr()
(and ideallyqual()
as well) to their FQPN. ...I don't feel very good about adding this kind of thing to Twisted.
Why not? Twisted has lots of utilities in twisted.python that just do stdlib-type stuff better than the stdlib, and for the most part I love them. Sure, some of them were ill-advised, but they haven't really hurt us. (i.e. how much time have you spent maintaining twisted.python.hook
this year?)
Perhaps it will come to that (we probably should have hashed this out long ago, oh well), but it's such a simple thing, and has basically nothing to do with Twisted... Why isn't it in the stdlib, or why isn't there a widespread third-party package that provides the functionality?
I'd certainly rather add a dozen more little utilities like this to Twisted than try to popularize a dozen widespread third-party packages, or live with the quality-control insanity of trying to integrate a dozen versions of a dozen third-party packages into our already onerous buildbot setup. (Not to mention that then we'd need to write a working, portable package manager for Python so that users can install Twisted's two-dozen dependencies without easy_install
deleting their root partition or whatever.)
Switching to a string (well, I'm going to use unicode actually, ha ha) for now seems like a good idea (r26013).
Speaking of "ha ha", sure you don't want to switch to something that raises an exception on __nonzero__
?
- The only
pyflakes
warning from any changed file isIUsernameDigestHash
. Not your fault, I know, but could you perhaps just add a blank line that says "IUsernameDigestHash # export in __all__ below
" and take it to zero?I really dislike hacking around pyflakes shortcomings like that. I'd rather add
__all__
support to pyflakes.
Le mieux est l'ennemi du bien.
I noticed that there wasn't a Divmod ticket for this, so I filed one. Be my guest and fix pyflakes, but in the meanwhile it's a lot easier to read pyflakes' output if I don't have to manually verify the accidental exports.
Actually, I just realized that one the main reason I haven't agitated for a pyflakes fix is that I actually like the pyflakes workaround, because it's also a Python workaround. It's easy to miss the __all__
entry down below; nothing will warn you if the name stops being defined, and the API is thusly incompatibly changed. But if you delete the import and you're not using pyflakes, the name reference below will blow up in your tests, right next to a comment that tells you the name is being explicitly exported.
Of course you know how I actually wish this worked.
Automation removed owner |
---|
@exarkun set status to closed |
---|
(In [27413]) Merge expressive-http-client-886-4
Author: exarkun, itamar Reviewer: glyph Fixes: #10610
Add a new HTTP 1.1 protocol implementation.
@exarkun commented |
---|
A lot of stuff has happened in the branch. There are still a lot of minor things to take care of (mostly boring coding standard stuff) and I'd like to add some higher-level API to the branch before merging it (nothing too hard, just something that won't scare away newbies), but it'd be great to get some feedback about the low-level stuff before proceeding to anything high-level.
The intent is to implement persistent connection support in a subsequent ticket. The goal for this code is just to not make it impossible to do that while still supporting the APIs added in this branch.
James, your feedback in particular would be appreciated. :) At the very least,connection control header detection is wrong. Hints on how to make it (or anything else) right would be helpful.
@radix commented |
---|
Couple of pre-review comments:
@exarkun commented |
---|
(In [25948]) Merge identity-transfer-encoding-3605
Author: exarkun, itamar Reviewer: therve Fixes: #3605 Refs: #10610
Add an identity transfer-encoding decoder, similar to the chunked transfer-encoding
decoder, to twisted.web.http
and use it to simplify HTTPChannel
slightly by
removing the knowledge of any particular transfer-encoding from its request body
handling code.
Also expand the transfer-encoding decoder interface so that it will be useful for an HTTP client and rename the chunked decoder so that its name more closely reflects what it does.
@itamarst commented |
---|
Re why response body comes as protocol, not IEntityBodyProducer: IEntityBodyProducer is still kinda crap. Users would need to support yet another new, over-complex consumer API (see the fun code needed to deal with request producers!), which hopefully we will eventually replace anyway with better version... So why not use API that is equivalent, much simpler, and that users already know: a protocol with a transport? Bleh. And now I've used up half my daily typing allowance :)
@exarkun commented |
---|
Replying to glyph:
twisted.web._newclient.HTTPParser.statusReceived
isn't valid epytext. I believe this is because of the "\r\n"; I've run into this problem before, I'd recommend spelling it "CRLF" in docstrings.
Fixed in r26052.
- the following reference (link) names couldn't be resolved by pydoctor:
- in
_newclient
:
IProtocol
IPushProducer
(I think you meanIPushProducer
)IResponse
- There also seems to be a bug for mwh, not you: pydoctor can't find "twisted.web.iweb.UNKNOWN_LENGTH", despite the ivar declaration.
- in
iweb
:
IConsumer
Deferred
IProducer
Failure
Hum. Indeed. I am rather dissatisfied with the proper spelling for these links. I think it makes the unprocessed docstrings rather harder to read. I'd like to talk to mwh about making pydoctor able to resolve them as-is or with some approach other than the one that epydoc/epytext requires. I've not done anything about them yet.
_WrapperException
needs a docstring, whether or not it remains double-private.
Fixed in r26053
makeStateDispatcher
should really have a docstring.
Fixed in r26054
- undocumented in
test_newclient
:
AccumulatingProtocol
(isn't there one of these somewhere that you can use? let's not make #3604 harder to fix.)
Fixed in r26055
1. `SlowRequest` 1. `SimpleRequest` 1. `StringProducer`
Fixed in r26056
1. `RequestTests`
Erm? This one has a docstring (albeit a short one which reveals very little useful information...)
1. "`mumble`". (Perhaps "lol" is not a useful comment, either.) 1. Assertion methods. (It would be more consistent for these to be on a class, but I don't think that's too important.) 1. `assertResponseFailed` 1. `assertRequestGenerationFailed` 1. `assertRequestTransmissionFailed` 1. `HTTPClientParserTests._noBodyTest`
Fixed in r26057.
So this leaves only the pydoctor/epytext issues.
That's it for the real meat of the review. Now for the optional stuff and further rumination on loosely-related ideas:
Replying to glyph:
OK. Awesome. Really glad to see this is finally in review! I agree with you in general. Some functionality needs to be public. I mentioned the specific motivation behind what is happening in this branch in the comment where I requested a review, though. Can you argue against that plan in particular?
The abovementioned comment on IRC was enough to convince me, at least, that the best course of action at this point is to complete the review cycle as quickly as possible, in whatever form.
I still stand by my earlier suggestion, in point 1.2 above, that some elements of this should be public — but this is not out of any procedural concern. I just have a fairly high level of confidence in the API as we've discussed it, and the implementation seems high-quality. The sketch adequately demonstrates to me that this module, or at least the parts that I propose become public, are suitable for a real substrate.
So feel free to make them public, or not.
Okay. I think actually I'm going to try the stacked branch thing you mentioned in a previous comment. After the higher-level stuff is done, then some of the lower-level stuff can probably become public. And then the two branches can be merged at almost the same time.
[snip - there should be a good way to define state machines]
Yep, I agree.
[snip - UNKNOWN_LENGTH.nonzero should raise or something]
Actually, even if the response body for an UNKNOWN_LENGTH response ends up being 0 bytes, you still have to treat the response object as you would for a known non-zero length response. In either case, you have to ask the response to produce the body to a protocol. So, while I wouldn't write it myself, if response.length:´ would let you detect the case where you need to call
deliverBody.
if response.length is UNKNOWN_LENGTH or response.length:is a nicer way to write it, but either works. So I ''could'' make
UNKNOWN_LENGTH.nonzero` raise and force people to write the nicer version, but should I actually?
[snip - replace protocol usage with another IEntityBodyProducer]
I hope itamar's replied clarified this situation. Really, we should just add a non-shitty IProducer
/IConsumer
replacement. :/ I think I'm going to leave this alone for now, though.
[snip - add a good const/enum/whatever thing to Twisted]
Eh yea okay, I guess we'll have to.
[snip - pyflakes workaround]
Okay, added in r26058.
ivank commented |
---|
Some servers like http://news.ycombinator.com/ use only \n to delimit status lines and headers, instead of \r\n. All not-ridiculously-obscure web browsers can parse this. _newclient.py revision 26111 is unable to parse any header from this page (why is raising AttributeError instead of BadHeaders? I would imagine that no headers are 'bad headers')
Failure: twisted.web._newclient.ResponseFailed: [<twisted.python.failure.Failure <type 'exceptions.AttributeError'>>]
My not-thoroughly-tested workaround is to have delimiter = '\n' in HTTPParser, and this in lineReceived:
if line and len(line) > 0 and line[-1] == '\r': line = line[:-1]
ivank commented |
---|
I don't know. It could take a while to find one in the wild.
Also, I can't find a browser that doesn't allow the broken behavior, even though it violates the RFC: "This flexibility regarding line breaks applies only to text media in the entity-body; a bare CR or LF MUST NOT be substituted for CRLF within any of the HTTP control structures (such as header fields and multipart boundaries)."
@exarkun commented |
---|
Any HTTP server that emits LF newlines is broken and written by someone who is either incompetent or malicious. Such people should be punished.
However, if someone (ie, ivank) wants to do the work to support this case, then that's fine with me. I do propose separating that out into a separate unit of work, though. ivank, want to file a ticket for implementing that once this code all makes it to trunk (which I am going to renew my efforts to achieve now)?
ivank commented |
---|
Okay, #3833 is for LF newline support.
truekonrads commented |
---|
I think that a high level API should be part of this ticket. I think it is best to have a stateful client (one that tracks cookies), like HTTPClientFactory tries to do.
I typically will use such object as a page scraper or to interact with Web APIs. Therefore, the following would be useful:
@exarkun commented |
---|
The high-level API is not going to happen as part of this ticket. It will have documentation, and it will be such that the first three features you described will not have to be implemented in Twisted or by copying a bunch of code from Twisted and editing it. There probably will be proxy and cookie support in Twisted, though.
If you'd like to contribute other requirements or suggestions to the high-level API, please edit TwistedWebClient.
Twisted Web only includes an HTTP 1.0 client protocol implementation. HTTP 1.1 includes numerous useful features which clients benefit from.
As a first step towards good HTTP 1.1 client support, add a low-level API for issuing HTTP 1.1 requests and parsing HTTP 1.1 responses (this excludes things like timeouts, following redirects, etc). Once this is done, higher-level APIs and more HTTP features can be added based on it.
Attachments:
Searchable metadata
``` trac-id__886 886 type__enhancement enhancement reporter__itamarst itamarst priority__high high milestone__ branch__branches_expressive_http_client_886_4 branches/expressive-http-client-886-4 branch_author__itamar__exarkun itamar, exarkun status__closed closed resolution__fixed fixed component__web web keywords__httpclient httpclient time__1108183305000000 1108183305000000 changetime__1304564598000000 1304564598000000 version__ owner__ cc__exarkun cc__jknight cc__ivank cc__truekonrads ```