python-hyper / h2

Pure-Python HTTP/2 protocol implementation
https://python-hyper.org/
MIT License
969 stars 157 forks source link

"IndexError: index out of range" in _reject_surrounding_whitespace #1257

Closed ad-m closed 3 years ago

ad-m commented 3 years ago

Hello,

I am working on integrating fuzzer to project h2. My goal is to include it in the oss-fuzz project. I believe that this can be a valuable contribution to the library, the importance of which is becoming more and more higher, among others due to the use by httpx - the library which popularity also growing.

I identified the first problems and I would like to know if such problems are interesting for your project and your are interested to handle them.

import h2.connection
import h2.config
from h2.exceptions import ProtocolError

payload = b'\x00\x00\x08\x01\x05\xef\x00\x00@\x00\x00\x05\x00\x00\x00\x00\x04\x8c\x00'
config = h2.config.H2Configuration()
conn = h2.connection.H2Connection(config=config)
try:
    conn.receive_data(payload)
    conn.data_to_send()
except ProtocolError:
    pass
Traceback (most recent call last):
  File "/tmp/h2-fuzz/test2.py", line 9, in <module>
    conn.receive_data(payload)
  File "/tmp/tmp.mokMU0wrkK/lib/python3.9/site-packages/h2/connection.py", line 1463, in receive_data
    events.extend(self._receive_frame(frame))
  File "/tmp/tmp.mokMU0wrkK/lib/python3.9/site-packages/h2/connection.py", line 1486, in _receive_frame
    frames, events = self._frame_dispatch_table[frame.__class__](frame)
  File "/tmp/tmp.mokMU0wrkK/lib/python3.9/site-packages/h2/connection.py", line 1560, in _receive_headers_frame
    frames, stream_events = stream.receive_headers(
  File "/tmp/tmp.mokMU0wrkK/lib/python3.9/site-packages/h2/stream.py", line 1057, in receive_headers
    events[0].headers = self._process_received_headers(
  File "/tmp/tmp.mokMU0wrkK/lib/python3.9/site-packages/h2/stream.py", line 1300, in _process_received_headers
    return list(headers)
  File "/tmp/tmp.mokMU0wrkK/lib/python3.9/site-packages/h2/utilities.py", line 492, in inner
    for header in headers:
  File "/tmp/tmp.mokMU0wrkK/lib/python3.9/site-packages/h2/utilities.py", line 437, in _validate_host_authority_header
    for header in headers:
  File "/tmp/tmp.mokMU0wrkK/lib/python3.9/site-packages/h2/utilities.py", line 331, in _reject_pseudo_header_fields
    for header in headers:
  File "/tmp/tmp.mokMU0wrkK/lib/python3.9/site-packages/h2/utilities.py", line 287, in _reject_connection_header
    for header in headers:
  File "/tmp/tmp.mokMU0wrkK/lib/python3.9/site-packages/h2/utilities.py", line 271, in _reject_te
    for header in headers:
  File "/tmp/tmp.mokMU0wrkK/lib/python3.9/site-packages/h2/utilities.py", line 255, in _reject_surrounding_whitespace
    if header[0][0] in _WHITESPACE or header[0][-1] in _WHITESPACE:
IndexError: index out of range
Kriechi commented 3 years ago

Yes - this looks like a valid bug!

How long does your fuzzing currently run? If you find multiple such errors I would prefer grouping everything into a single GitHub issue, but if you have a fuzzing strategy based on different frame types or other structured input data, I'd also be happy to see new issues per group et al.

ad-m commented 3 years ago

How long does your fuzzing currently run?

A few seconds.

If you find multiple such errors I would prefer grouping everything into a single GitHub issue, but if you have a fuzzing strategy based on different frame types or other structured input data, I'd also be happy to see new issues per group et al.

I don't have multiple fuzzing strategy yet and does not use complex structures as input data.

This is a completely new world for me. h2 is the first project to integrate. I also have no deep knowledge of HTTP / 2. However, I want to take the time to get to know the topic and support this library. I would like setup eg. CIFuzz to verify regression ( https://google.github.io/oss-fuzz/getting-started/continuous-integration/ ) and os-fuzz for continous integration. os-fuzz use seperate issue tracker (see https://google.github.io/oss-fuzz/architecture/ ).

Kriechi commented 3 years ago

Quick analysis of the payload you found:

The last two bytes don't matter for this IndexError. It correctly reads a HEADER frame: HeadersFrame(stream_id=1862271040, flags=['END_HEADERS', 'END_STREAM']): exclusive=False, depends_on=0, stream_weight=0, data=<hex:0000050000000004>

The hpack blob is 0000050000000004 in hex.

I'm not sure if we should consider this a bug in h2 (obviously out-of-bounds reading of an empty headers name), or a decoding error in hpack.

@sethmlarson are empty header names even valid in hpack?

sethmlarson commented 3 years ago

@Kriechi So I couldn't follow a trail from RFC 7540 to RFC 7231 to RFC 7230 but in spirit we know that "HTTP semantics" (RFC 7231) is a high-level capture of the semantics defined in "HTTP/1.1" (RFC 7230) so... yeah, this is the best I've got:

From RFC 7230 Section 3.2:

header-field = field-name ":" OWS field-value OWS field-name = token

and in the collected ABNF appendix:

token = 1*tchar

so as expected the header name/field should be 1 or more characters so I wouldn't expect an empty name to be valid for HPACK.

Kriechi commented 3 years ago

@sethmlarson I agree on the semantic reasoning. Curiously though, our hpack implementation AND the golang implementation AND a popular ruby gem all happily decode this payload into a header with empty name.

So right now, I'm leaning towards catching this in h2 by simply filtering out these invalid headers without throwing an error. It seems to be valid hpack. Valid hpack payload means it is a valid HEADERS frame. Valid HEADERS frame means it should be a valid request/response.

Any better ideas?