indygreg / python-zstandard

Python bindings to the Zstandard (zstd) compression library
BSD 3-Clause "New" or "Revised" License
512 stars 90 forks source link

stream_reader() hangs (keeps spinning on CPU) when empty buffer is supplied #194

Open piotrdomagalski opened 1 year ago

piotrdomagalski commented 1 year ago

Hi there!

We've run into an issue where bad input data caused the library to keep spinning in decompression_reader.c:370 as seen on the screenshot from py-spy stacktrace dump.

Screenshot 2023-04-27 at 09 37 32

Here's the code to reproduce this situation:

import zstandard
import io

decompressor = zstandard.ZstdDecompressor()
bad_input = b''

with decompressor.stream_reader(bad_input) as decompressing_reader:
    with io.TextIOWrapper(decompressing_reader, encoding='utf-8', newline='\n') as reader:
        for line in reader:
            print(line)
indygreg commented 1 year ago

Confirmed. Only reproduces in C backend. Not Rust nor FFI.

Thanks for the report.

indygreg commented 1 year ago

The bug here is that the C backend implementation of ZstdDecompressionReader.read1() will infinite loop for inputs conforming to the buffer protocol having 0 length. This appeared to evade test coverage (including fuzzing) because we never sent an empty input into the fuzz tests. I'll change that as part of fixing this.

indygreg commented 1 year ago

And adding fuzzing test coverage with empty inputs reveals that other methods also choke. e.g. readinto1() is also buggy.