webrecorder / pywb

Core Python Web Archiving Toolkit for replay and recording of web archives
https://pypi.python.org/pypi/pywb
GNU General Public License v3.0
1.37k stars 216 forks source link

Use 302 instead of 307 in TimeGate #545

Open ibnesayeed opened 4 years ago

ibnesayeed commented 4 years ago

TimeGate in redirect mode MUST use 302-style content negotiation and not 307, which is not part of the Memento RFC, should 307-style be mandatory, the matter must be discussed with the community to resolve collaboratively in a transparent manner.

See: https://ws-dl.blogspot.com/2020/03/2020-03-26-memento-compliance-audit-of.html#3-4-timegate

ato commented 4 years ago

Since browsers change POST to GET when following a 302 redirect some sort of workaround is needed. The obvious options for implementing POST replay seem to be:

A compromise might be to use 307 in response to POST requests and 302 in response to other methods to make Memento happy. The Memento RFC doesn't seem to have anything to say about non-GET/HEAD requests anyway and where there is conflict I think it's likely the majority of Pwyb users would prefer replay correctness over strict Memento compliance. :-)

ibnesayeed commented 4 years ago

A compromise might be to use 307 in response to POST requests and 302 in response to other methods to make Memento happy.

This sounds like a reasonable approach to me.

I think it's likely the majority of Pwyb users would prefer replay correctness over strict Memento compliance.

They say, "if you want to go fast, go alone, if you want to go far, go together."

Ad hoc and application-specific solutions make users of those specific applications happy and make the life of the app developer easy in a short run, but may cause a mess in the ecosystem where inter-operablity with other tools and services is important.

phonedude commented 4 years ago

Do browsers actually change POST to GET? I know RFC 7231 (nee 2616) "allows" it, but they don't have to.

https://tools.ietf.org/html/rfc7231#section-6.4.3

  Note: For historical reasons, a user agent MAY change the request
  method from POST to GET for the subsequent request.  If this
  behavior is undesired, the 307 (Temporary Redirect) status code
  can be used instead.

So the question becomes: is this a behavior that we're witnessing? Is this happening in the context of oldweb.today with the old browsers? Although it seems unlikely they're doing meaningful replay of archived POST.

Otherwise, I expect the clients that interact with TimeGates to be only clients that we (the web archiving community) write, and we can just say "don't swap GET and POST".