jacob1044 / pubsubhubbub

Automatically exported from code.google.com/p/pubsubhubbub
Other
1 stars 0 forks source link

HTTP response codes during verification #45

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
SUMMARY:

The spec's recommendations for Hub and Subscriber's use of HTTP response codes 
conflict directly with the 
HTTP spec's definitions of those codes.

RELEVANT SECTION:  

"If the subscriber does not agree with the action, the subscriber MUST respond 
with a 404 "Not Found" 
response. The hub MUST consider other client and server response codes (3xx, 
4xx, and 5xx) to mean that 
the subscription is not verified, meaning the hub SHOULD retry verification 
until a definite 
acknowledgement (positive or negative) is received."

COMMENT/REQUEST:

Some HTTP response codes (eg 400 and 410) require that the client (ie the Hub) 
does not retry the request, 
yet the spec here explicitly requires it to. It also assigns a special meaning 
to the 404 response code that 
doesn't exist in the HTTP spec. Since the entire point of this exercise is that 
the Hub may find itself 
verifying a cooperative HTTP host that is NOT in any way a PubSubHubbub 
subscriber, the PSH spec needs 
to stick  very closely to the HTTP spec in this area. 

And on a related note, the spec also fails to define how the Hub should behave 
should it receive a 2xx 
code with an incorrect body.

I'd suggest the correct starting point for this part of the protocol is that 
anything other than a 200 response 
that includes the correct challenge code be considered a dead failure and that 
automatic retries should not 
be encouraged. Any suggestions beyond that needs to be thought through very 
carefully.

Original issue reported on code.google.com by graham.p...@utsire.com on 9 Aug 2009 at 2:41

GoogleCodeExporter commented 9 years ago
"2xx code with an incorrect body" is a failure to verify. The only successful
response is 200 with the challenge. The only failed response is 404. All other
responses cause retries.

I don't like the idea of "anything other than a 200 response that includes the
correct challenge code be considered a dead failure" because it means the hub 
will
continue to retry against a dead URL that was never live. A 404 causing failure 
means
you can put a static webserver there taking HTTP requests and cause all 
subscriptions
to terminate. This is a good thing for DoS protections in my mind.

Arguably, 410 is the "correct" response code here. But the HTTP spec says:

"If the server does not know, or has no facility to determine, whether or not 
the
condition is permanent, the status code 404 (Not Found) SHOULD be used instead. 
This
response is cacheable unless indicated otherwise."

My feeling here is that the subscriber URL may be reused, thus the GONE-ness of 
the
subscriber is temporary, and thus 404 is correct.

I guess 404 is to 410 as 302 is to 301?

Original comment by bslatkin on 20 Aug 2009 at 7:07

GoogleCodeExporter commented 9 years ago
Your reply above seems to assume that every HTTP server on the internet is a 
PubSubHubbub publisher. This is 
very clearly not true. And that's why you don't get to assign special meanings 
to HTTP response codes.

Once of the major problems here is "All other responses cause retries." You're 
requiring (theoretically at least) 
that every HTTP server on the internet be reprogrammed to recognise 
PubSubHubbub requests and reply with a 
404 to make the retries stop.

The fundamental thing here HTTP itself already defines a multitude of errors 
and in many cases these have 
associated retry behaviour already defined. Your protocol needs to exist on top 
of this existing functionality, not 
redefine it. 

If PubSubHubbub is to have HTTP as its transport, then you need to accept that 
you may get any HTTP response 
under the sun and any implementation MUST interpret it as per the HTTP spec 
before it does anything else.

(the protocol defined in the current spec where 404 is "Yes" and "200" is "No" 
and all other responses result in 
automatic retries is NOT HTTP - it's an invented cargo cult protocol)

Original comment by graham.p...@utsire.com on 20 Aug 2009 at 12:02

GoogleCodeExporter commented 9 years ago
I understand what you've written here, but I don't understand what your concrete
proposal is. I believe if I add something like, "Hubs should treat 400, 404, 
and 410
response codes as confirmation that the subscription should be cancelled", that
you'll be happy.

The reason we want a single HTTP code to indicate subscription
negative-acknowlegement is primarily to reduce pain for subscriber 
implementors. For
example, what happens if your webserver goes away for a day? Should all of your
subscriptions expire? What if your webserver serves 500s for an extended 
period? Or
invalid 403s? Practically speaking, we should do the most permissive thing here 
that
results in the intended action of the subscriber, which is to keep their
subscriptions active.

Otherwise, I think your concerns may be alleviated by Issue 24, which will fix 
the
retry behavior to be up to a reasonable amount of time before eventually giving 
up
(thus ending any long-lived retry cycles).

Original comment by bslatkin on 27 Aug 2009 at 3:12

GoogleCodeExporter commented 9 years ago
Firstly I don't understand your comment - the part of the spec I'm complaining 
about (6.2) is verification 
which happens only once immediately after the subscriber sends their 
subscription request (doesn't it?). Your 
comment seems to be referring to behaviour much later in the process.

I think that if the hub encounters any kind of error (HTTP, TCP, whatever - you 
don't need to define what) 
when it sends the verification request, it should give up. If the returned 
challenege code doesn't match, it 
should give up. Retrying should be up to the subscriber.

You might want to provide some guidance on what a Pubsubhubbub subscriber 
should do to explicitly reject a 
verification request - a 200 code and a particular body or a blank body would 
be the preferred option. Since 
in this instance the HTTP transaction has worked fine, it probably should not 
be an HTTP error code.

Original comment by graham.p...@utsire.com on 29 Aug 2009 at 11:21

GoogleCodeExporter commented 9 years ago
It's in the spec: 404 rejects a subscription request. Retrying verification 
requests
in the hub is important for asynchronous subscription, and even more important 
with
automatic subscription reverification (coming in version 0.2 of the spec).

I understand that a 200 could indicate the HTTP transaction has completed, but 
that's
not practically useful. Reporting an error code (404) works better for web 
servers in
practice, where an error-rate of some kind would raise attention in monitoring 
to
subscriptions that are being rejected. More importantly, 404s are the default
response code you'll get if you send a random request at a random URI. This 
prevents
verification spam from DoSing servers through indirection attacks on the Hub.

Otherwise, note that the last sentence from this part of the HTTP spec:

""
10.4.5 404 Not Found

The server has not found anything matching the Request-URI. No indication is 
given of
whether the condition is temporary or permanent. The 410 (Gone) status code 
SHOULD be
used if the server knows, through some internally configurable mechanism, that 
an old
resource is permanently unavailable and has no forwarding address. This status 
code
is commonly used when the server does not wish to reveal exactly why the 
request has
been refused, or when no other response is applicable. 
""

Original comment by bslatkin on 30 Aug 2009 at 7:45