ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
32.2k stars 2.35k forks source link

`std.compress.xz` does not implement `read` correctly (it treats end-of-stream as an error) #20408

Open squeek502 opened 2 weeks ago

squeek502 commented 2 weeks ago

Zig Version

0.14.0-dev.117+5f5895626

Steps to Reproduce and Observed Behavior

This test added to lib/std/compress/xz/test.zig:

const std = @import("std");
const testing = std.testing;
const xz = std.compress.xz;

test "read after all data has been read" {
    const data = @embedFile("testdata/good-1-check-none.xz");
    var in_stream = std.io.fixedBufferStream(data);

    var xz_stream = try xz.decompress(testing.allocator, in_stream.reader());
    defer xz_stream.deinit();
    const reader = xz_stream.reader();

    var buf: [4096]u8 = undefined;

    // Read all the decompressed data once
    _ = try reader.readAll(&buf);

    // Now attempt another read call
    try testing.expectEqual(0, try reader.read(&buf));
}

will error with

56/60 compress.xz.test.test.read after all data has been read...FAIL (EndOfStream)
C:\Users\Ryan\Programming\Zig\zig\lib\std\leb128.zig:16:22: 0xe4d3b6 in readULEB128__anon_22005 (test.exe.obj)
        const byte = try reader.readByte();
                     ^
C:\Users\Ryan\Programming\Zig\zig\lib\std\compress\xz.zig:91:38: 0xe3cb6c in read (test.exe.obj)
                const record_count = try std.leb.readULEB128(u64, counting_reader);
                                     ^
C:\Users\Ryan\Programming\Zig\zig\lib\std\compress\xz\test.zig:34:32: 0xe3c60d in test.read after all data has been read (test.exe.obj)
    try testing.expectEqual(0, try reader.read(&buf));
                               ^

This is because the read implementation can return EndOfStream from many different places within its read implementation, even though it likely should be returning 0 to signal there's nothing left to read; see the documentation of readFn:

https://github.com/ziglang/zig/blob/ab4c461b76ff7b1d10e6d2010370ea0984f97efe/lib/std/io.zig#L80-L83

This also means that wrapping the return of xz.decompress in a bufferedReader will almost always cause reading to fail, since bufferedReader relies on the end-of-stream-is-not-an-error behavior. Modifying this test case:

https://github.com/ziglang/zig/blob/ab4c461b76ff7b1d10e6d2010370ea0984f97efe/lib/std/compress/xz/test.zig#L5-L12

like so:

-    return xz_stream.reader().readAllAlloc(testing.allocator, std.math.maxInt(usize));
+    var buffered_xz_stream = std.io.bufferedReader(xz_stream.reader());
+
+    return buffered_xz_stream.reader().readAllAlloc(testing.allocator, std.math.maxInt(usize));

will cause the "compressed data" test block to start failing.

Expected Behavior

The tests above to pass


Some more context in this #zig-help Discord thread

cc @FnControlOption @ianic since they've worked on std.compress.xz

squeek502 commented 2 weeks ago

Addendum: this is seemingly not an isolated incident. Adding this:

    comptime {
        if (ReadError || error{EndOfStream} == ReadError) {
            @compileError("ReadError cannot contain EndOfStream");
        }
    }

to GenericReader, it gets triggered from:

lib\std\compress\flate\inflate.zig:344:41: note: called from here
        pub const Reader = std.io.Reader(*Self, Error, read);
                           ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
lib\std\compress\flate\inflate.zig:344:41: note: called from here
        pub const Reader = std.io.Reader(*Self, Error, read);
                           ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
lib\std\http\Client.zig:940:41: note: called from here
    const TransferReader = std.io.Reader(*Request, TransferReadError, transferRead);
                           ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
lib\std\compress\lzma.zig:33:41: note: called from here
        pub const Reader = std.io.Reader(*Self, Error, read);
                           ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
lib\std\compress\xz.zig:37:41: note: called from here
        pub const Reader = std.io.Reader(*Self, Error, read);
                           ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~

so presumably this same type of bug could exist with all of those GenericReader implementations as well. However, it may not necessarily be a bug in every case (it only leads to problems if it is accidentally being used to signal "no more data to read"); I was just curious how many other GenericReader errors contained EndOfStream.

squeek502 commented 2 weeks ago

A simple but potentially naive fix could look like this (relevant code):

-                const record_count = try std.leb.readULEB128(u64, counting_reader);
+                const record_count = std.leb.readULEB128(u64, counting_reader) catch |err| switch (err) {
+                    error.EndOfStream => return 0,
+                    else => |e| return e,
+                };

This fixes the tests in the OP, but there might be a better way to determine when this EndOfStream should be treated as unrecoverable or as "no more data to read."