python-hyper / hyper

HTTP/2 for Python.
http://hyper.rtfd.org/en/latest/
MIT License
1.05k stars 191 forks source link

h2.exceptions.ProtocolError: Received duplicate pseudo-header field :path #346

Closed xianglisegue closed 7 years ago

xianglisegue commented 7 years ago

The following test code:

from hyper import HTTP20Connection
conn = HTTP20Connection('google.com:80')
conn.request('GET', '/restconf/data/ietf-yang-library:modules-state/module=ietf-netconf-acm,2012-02-22')
resp = conn.get_response()
print(resp.read())

generates

Traceback (most recent call last):
  File "ht.py", line 4, in <module>
    conn.request('GET', '/restconf/data/ietf-yang-library:modules-state/module=ietf-netconf-acm,2012-02-22')
  File "C:\Python27_segue\lib\site-packages\hyper\http20\connection.py", line 271, in request
    self.endheaders(message_body=body, final=True, stream_id=stream_id)
  File "C:\Python27_segue\lib\site-packages\hyper\http20\connection.py", line 555, in endheaders
    stream.send_headers(headers_only)
  File "C:\Python27_segue\lib\site-packages\hyper\http20\stream.py", line 98, in send_headers
    conn.send_headers(self.stream_id, headers, end_stream)
  File "C:\Python27_segue\lib\site-packages\h2\connection.py", line 844, in send_headers
    headers, self.encoder, end_stream
  File "C:\Python27_segue\lib\site-packages\h2\stream.py", line 913, in send_headers
    headers, encoder, hf, hdr_validation_flags
  File "C:\Python27_segue\lib\site-packages\h2\stream.py", line 1307, in _build_headers_frames
    encoded_headers = encoder.encode(headers)
  File "C:\Python27_segue\lib\site-packages\hpack\hpack.py", line 249, in encode
    for header in headers:
  File "C:\Python27_segue\lib\site-packages\h2\utilities.py", line 473, in inner
    for header in headers:
  File "C:\Python27_segue\lib\site-packages\h2\utilities.py", line 418, in _validate_host_authority_header
    for header in headers:
  File "C:\Python27_segue\lib\site-packages\h2\utilities.py", line 330, in _reject_pseudo_header_fields
    "Received duplicate pseudo-header field %s" % header[0]
h2.exceptions.ProtocolError: Received duplicate pseudo-header field :path

I think the URL '/restconf/data/ietf-yang-library:modules-state/module=ietf-netconf-acm,2012-02-22' should be valid. It appears that he comma between "ietf-netconf-acm,2012-02-22" is causing problems.

Any ideas how do I overcome this? Thanks! -Xiang

Lukasa commented 7 years ago

Just to confirm, did you install hyper from the master branch or from pip? If you used pip, can you try the master branch?

Lukasa commented 7 years ago

Generally speaking a comma is not allowed unencoded inside the path portion of a URL. More generally you need to be urlencoding your paths so that they can be safely transmitted. I don't think this is an issue. :smile:

xianglisegue commented 7 years ago

A comma in a URL's 'path' is valid and should not be required to be percent encoded (%2c). I think hyper interpreted this wrong and there is a bug in its path validation code. A comma may need to be percent encoded in other parts of a URI but not in the path.

https://tools.ietf.org/html/rfc3986

3.3. Path ....

Aside from dot-segments in hierarchical paths, a path segment is considered opaque by the generic syntax. URI producing applications often use the reserved characters allowed in a segment to delimit scheme-specific or dereference-handler-specific subcomponents. For example, the semicolon (";") and equals ("=") reserved characters are often used to delimit parameters and parameter values applicable to that segment. The comma (",") reserved character is often used for similar purposes. For example, one URI producer might use a segment such as "name;v=1.1" to indicate a reference to version 1.1 of "name", whereas another might use a segment such as "name,1.1" to indicate the same. Parameter types may be defined by scheme-specific semantics, but in most cases the syntax of a parameter is specific to the implementation of the URI's dereferencing algorithm.

RESTCONF RFC8040 uses comma in a URI's path extensively. For example: https://tools.ietf.org/html/rfc8040 3.5.3. Encoding Data Resource Identifiers in the Request URI

Lastly, Curl does handle comma in a URI's path correctly. For example:

$ curl -vvv -k --cert-type PEM --cert ./segue1_curl.pem -X GET https://segue.crabdance.com:8443/restconf/data/ietf-yang-library:modules-state/module=ietf-netconf-acm,2012-02-22 ...

GET /restconf/data/ietf-yang-library:modules-state/module=ietf-netconf-acm,2012-02-22 HTTP/2 Host: segue.crabdance.com:8443 User-Agent: curl/7.54.0 Accept: /

  • Connection state changed (MAX_CONCURRENT_STREAMS updated)! < HTTP/2 200 < content-type: application/yang.api+json < content-length: 380 < server: jetconf-h2 < cache-control: No-Cache < access-control-allow-origin: * < access-control-allow-methods: POST, GET, OPTIONS, PUT, DELETE < access-control-allow-headers: Content-Type < etag: 7633766340891256821 < last-modified: Mon, 28 Aug 2017 12:37:01 GMT < { "ietf-yang-library:module": [ { "name": "ietf-netconf-acm", "revision": "2012-02-22", "namespace": "urn:ietf:params:xml:ns:yang:ietf-netconf-acm", "conformance-type": "implement", "schema": "https://raw.githubusercontent.com/YangModels/yang/master/standard/ietf/RFC/ietf-netconf-acm.yang" } ]
  • Connection #0 to host segue.crabdance.com left intact }ubuntu@ip-172-26-8-91:~/jetconf$
xianglisegue commented 7 years ago

Yes I am using the latest code from master branch using the command I was told in an earlier issue.

pip install -U git+https://github.com/Lukasa/hyper.git

xianglisegue commented 7 years ago

Hi, Any comments? I am not sure if you can see my last comments on the "closed" ticket or I need to open a new issue.

-Xiang

Lukasa commented 7 years ago

No, I can see them fine, I'm just busy. :smile:

Yeah, I can believe we're being too aggressive about commas in paths. Are you interested in providing a patch for this?

xianglisegue commented 7 years ago

Thanks for the update. No rush, just wanted to make sure you got my message:). -Xiang

xianglisegue commented 7 years ago

I found a quick fix by adding ':path' as a special header that should not be split by 'comma'. hyper/common/headers.py

def canonical_form(k, v): """ Returns an iterable of key-value-pairs corresponding to the header in canonical form. This means that the header is split on commas unless for any reason it's a super-special snowflake (I'm looking at you Set-Cookie). """

#SPECIAL_SNOWFLAKES = set([b'set-cookie', b'set-cookie2'])

SPECIAL_SNOWFLAKES = set([b'set-cookie', b'set-cookie2', b':path'])

k = k.lower()

if k in SPECIAL_SNOWFLAKES:
    yield k, v
else:
    for sub_val in v.split(b','):
        yield k, sub_val.strip()