psf / requests

A simple, yet elegant, HTTP library.
https://requests.readthedocs.io/en/latest/
Apache License 2.0
52.24k stars 9.34k forks source link

requests.get() with both a query string in URL and params defined does not adhere to HTML5 specifications for form interaction #5444

Closed snorpdorp closed 4 years ago

snorpdorp commented 4 years ago

As per HTML5 specifications section "4.10.22.3 Form submission algorithm" , when a form with an action which contains a query string, and has a method of GET is executed, the query string in the action should be completely ignored and overwritten by the query string generated by the form.

However, in requests, when a request is formed with an URL including a query string, as well as a params parameter, the 2 query strings are merged, with both being used.

(I have no idea why any website would ever produce such a form, but this is explicitly mentioned in the specifications.)

There may be some debate over whether this is to be intended behavior or not: there is no strict implication that a GET with param data is meant specifically to emulate behavior of an HTML form, although that would seem to be the most common case to me. Some HTTP APIs may have some endpoint which includes a query string and also request additional data to be sent as part of the query string. However, OpenAPI does not mention anything about what to do if an endpoint itself contains a query string, only stating to "Query parameters are the most common type of parameters. They appear at the end of the request URL after a question mark (?), with different name=value pairs separated by ampersands (&). Query parameters can be required and optional."

This also makes it sound like explicitly mentioned "params" parameter should refer to everything after the question-mark delimiter, but I do not think they meant to be so specific as to talk about this strange edge-case.

So it seems that regardless of RESTful APIs, or sending GET forms, the parameters sent to the server should be the ones explicitly mentioned in params, and the parameters given to requests in the url should be ignored.

import requests
request = requests.Request(method="GET", url="https://somewebsite.com?pointless+key=pointless+value", params={"actual key": "actual value"})
session = requests.Session()
prepared_request = session.prepare_request(request)
print(request.url)

Expected Result

https://somewebsite.com/?actual+key=actual+value

Actual Result

https://somewebsite.com/?pointless+key=pointless+value&actual+key=actual+value

Reproduction Steps

import requests
request = requests.Request(method="GET", url="https://somewebsite.com?pointless+key=pointless+value", params={"actual key": "actual value"})
session = requests.Session()
prepared_request = session.prepare_request(request)
print(request.url)

System Information

{
  "chardet": {
    "version": "3.0.4"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "2.8"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.7.3"
  },
  "platform": {
    "release": "19.4.0",
    "system": "Darwin"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.21.0"
  },
  "system_ssl": {
    "version": "20000000"
  },
  "urllib3": {
    "version": "1.24.1"
  },
  "using_pyopenssl": false
}
sigmavirus24 commented 4 years ago

Requests' behaviour is defined by RFC 7230, 7231, 7232, 7233, 7234, and 7235 none of which indicate your expectations as required behaviour. This behaviour has also been present for nearly a decade. Finally, Requests is in a feature freeze and the maintainers are (more-or-less) only fixing security issues. You can easily handle this yourself.

VeNoMouS commented 4 years ago

@sigmavirus24 sorry to hijack the thread, but why is Requests in a feature freeze? is this because 3.x is coming?

nateprewitt commented 4 years ago

@VeNoMouS Requests has been in a feature freeze for the majority of its existence. The goal of the project's scope was summed up pretty well here in 2013. I think our definition of "freeze" has changed a handful of times since then but the guiding idea is still there.

3.x is an elephant in the room. We have a branch with proposed changes and a lot of hard work from former maintainers and many other contributors. A significant amount of scope creep was tacked onto this and prematurely promised as a "Requests 3.0" which we weren't in a place to deliver on. I've given a lot of thought to just releasing what's done but we don't have enough support to triage latent bugs or inevitable backlash of what the next major version "doesn't do".

Seth and the urllib3 team are still making valuable improvements that Requests benefits from and we may eventually get to a point of delivering some incarnation of a new major version. For the time being, we're running on Sigmavirus' generous gifts of time and an issue management bot.

VeNoMouS commented 4 years ago

@nateprewitt hey thanks for the quick and well put response ❤️