psf / requests

A simple, yet elegant, HTTP library.
https://requests.readthedocs.io/en/latest/
Apache License 2.0
52.19k stars 9.33k forks source link

Double-digit link-local IPv6 zone id raises ValueError #6808

Open twslankard opened 1 month ago

twslankard commented 1 month ago

When making a request to a link-local IPv6 address, it becomes necessary to specify the "zone id" aka "scope id". RFC 6874 specifies zone ids as follows:

According to IPv6 Scoped Address syntax [RFC4007], a zone identifier is attached to the textual representation of an IPv6 address by concatenating "%" followed by , where is a string identifying the zone of the address. However, the IPv6 Scoped Address Architecture specification gives no precise definition of the character set allowed in . There are no rules or de facto standards for this. For example, the first Ethernet interface in a host might be called %0, %1, %en1, %eth0, or whatever the implementer happened to choose.

In a URI, a literal IPv6 address is always embedded between "[" and "]". This document specifies how a can be appended to the address. According to URI syntax [RFC3986], "%" is always treated as an escape character in a URI, so, according to the established URI syntax [RFC3986] any occurrences of literal "%" symbols in a URI MUST be percent-encoded and represented in the form "%25". Thus, the scoped address fe80::a%en1 would appear in a URI as http://[fe80::a%25en1].

I understand the above to mean that the "zone id" can be multiple characters (even when it's an integer.)

When using requests.get to request information from IoT devices using link-local networking, I occasionally see double-digit integer zone ids. An example of such a scoped address is fe80::be0f:a7ff:fe00:2929%53. As such, I'm escaping the % as %25 as indicated in the RFC. (In this case, this has no effect on the results.)

Expected Result

This does not throw.

import requests
host = 'fe80::be0f:a7ff:fe00:2929%2553'
url = f'http://[{host}]'
requests.get(url)

Actual Result

It throws:

ValueError: 'fe80::be0f:a7ff:fe00:2929S' does not appear to be an IPv4 or IPv6 address

Some debugging reveals that urllib3.util.parse_url replaces %25 with % in its result.

Reproduction Steps

  1. Run the code in the expected result section.

System Information

$ python -m requests.help
{
  "chardet": {
    "version": "5.2.0"
  },
  "charset_normalizer": {
    "version": "3.3.2"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "3.7"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.12.3"
  },
  "platform": {
    "release": "6.8.0-40-generic",
    "system": "Linux"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.32.3"
  },
  "system_ssl": {
    "version": "30000020"
  },
  "urllib3": {
    "version": "2.2.2"
  },
  "using_charset_normalizer": false,
  "using_pyopenssl": false
}