rincebrain commented 2 years ago

System information

Type	Version/Name
Distribution Name	N/A
Distribution Version	N/A
Kernel Version	N/A
Architecture	N/A
OpenZFS Version	N/A

Describe the problem you're observing

Ever since I learned that the gzip compression option uses the OS's zlib implementation, I'd been slightly worried that perhaps various OS's zlibs might not produce the same outputs.

I finally get around to testing it, and wouldn't you know it...

(Same file, all on the same single "disk" pool, in separate datasets)

Linux (OpenZFS 2.0.6, Debian 11):

               0   L0 DVA[0]=<0:20017e00:3200> [L0 ZFS plain file] fletcher4 gzip-6 unencrypted LE contiguous unique single size=20000L/3200P birth=14L/14P fill=1 cksum=64b6b1b5298:280c5c78e4d23c:a72beaa3fa491cbe:941acf2f5b3b8045
           20000   L0 DVA[0]=<0:20014e00:3000> [L0 ZFS plain file] fletcher4 gzip-6 unencrypted LE contiguous unique single size=20000L/3000P birth=14L/14P fill=1 cksum=6c1af7c9d52:27ed3b6fd4ec36:9e0634c03bc9f890:6cba859f7479c51f
           40000   L0 DVA[0]=<0:20007200:2e00> [L0 ZFS plain file] fletcher4 gzip-6 unencrypted LE contiguous unique single size=20000L/2e00P birth=14L/14P fill=1 cksum=6706b123fb0:2513277ffb4f64:8c1d5045e4f16723:7f1f6e0af60d63e4
           60000   L0 DVA[0]=<0:20025600:2a00> [L0 ZFS plain file] fletcher4 gzip-6 unencrypted LE contiguous unique single size=20000L/2a00P birth=14L/14P fill=1 cksum=56dd8e549cf:1ca460e5465173:63779d307d19d87d:2aba9bbc379b2d75
           80000   L0 DVA[0]=<0:2001b000:1a00> [L0 ZFS plain file] fletcher4 gzip-6 unencrypted LE contiguous unique single size=20000L/1a00P birth=14L/14P fill=1 cksum=35ef02b5218:bc71a6c26de0d:195b50d2dd3efc57:bad685ed3610e490
           a0000   L0 DVA[0]=<0:20039600:2400> [L0 ZFS plain file] fletcher4 gzip-6 unencrypted LE contiguous unique single size=20000L/2400P birth=14L/14P fill=1 cksum=4b3c280f369:15b4674e338ec9:40f2462250b10d19:c482b12d89308776

FreeBSD (13-RELEASE):

               0   L0 DVA[0]=<0:32000f400:3200> [L0 ZFS plain file] fletcher4 gzip-6 unencrypted LE contiguous unique single size=20000L/3200P birth=58L/58P fill=1 cksum=62e14cdbb66:27bb275ead473e:a789a1ab7e88adc9:c0490e8df8558a15
           20000   L0 DVA[0]=<0:320003200:3000> [L0 ZFS plain file] fletcher4 gzip-6 unencrypted LE contiguous unique single size=20000L/3000P birth=58L/58P fill=1 cksum=6c6ba41d0fb:281ebd384bf900:9f2b4d71cc556d68:c2fb3611ebc52ac
           40000   L0 DVA[0]=<0:320012600:2e00> [L0 ZFS plain file] fletcher4 gzip-6 unencrypted LE contiguous unique single size=20000L/2e00P birth=58L/58P fill=1 cksum=673f2c71993:2550120b71d0e4:8ce234824e5b4c9e:5f273d1c36a5ddc3
           60000   L0 DVA[0]=<0:320016c00:2a00> [L0 ZFS plain file] fletcher4 gzip-6 unencrypted LE contiguous unique single size=20000L/2a00P birth=58L/58P fill=1 cksum=58ed0939ecc:1d0ecf09a23aa8:654f52b89123c00b:2ce0f14e36884983
           80000   L0 DVA[0]=<0:320015400:1800> [L0 ZFS plain file] fletcher4 gzip-6 unencrypted LE contiguous unique single size=20000L/1800P birth=58L/58P fill=1 cksum=33c959ce924:9a8f32ee7f3ee:133103670a3682c5:b25f14f4adfbf874
           a0000   L0 DVA[0]=<0:320019600:2400> [L0 ZFS plain file] fletcher4 gzip-6 unencrypted LE contiguous unique single size=20000L/2400P birth=58L/58P fill=1 cksum=4b02d996194:15718c7c45c6dd:3fe0ec38b9d454e8:f79b323b94bbba38

illumos (git from 20211221):

               0   L0 DVA[0]=<0:42000fc00:3200> [L0 ZFS plain file] fletcher4 gzip-6 unencrypted LE contiguous unique single size=20000L/3200P birth=6964L/6964P fill=1 cksum=62e14cdbb66:27bb275ead473e:a789a1ab7e88adc9:c0490e8df8558a15
           20000   L0 DVA[0]=<0:420009400:3000> [L0 ZFS plain file] fletcher4 gzip-6 unencrypted LE contiguous unique single size=20000L/3000P birth=6964L/6964P fill=1 cksum=6c6ba41d0fb:281ebd384bf900:9f2b4d71cc556d68:c2fb3611ebc52ac
           40000   L0 DVA[0]=<0:420017400:2e00> [L0 ZFS plain file] fletcher4 gzip-6 unencrypted LE contiguous unique single size=20000L/2e00P birth=6964L/6964P fill=1 cksum=673f2c71993:2550120b71d0e4:8ce234824e5b4c9e:5f273d1c36a5ddc3
           60000   L0 DVA[0]=<0:42000d200:2a00> [L0 ZFS plain file] fletcher4 gzip-6 unencrypted LE contiguous unique single size=20000L/2a00P birth=6964L/6964P fill=1 cksum=58ed0939ecc:1d0ecf09a23aa8:654f52b89123c00b:2ce0f14e36884983
           80000   L0 DVA[0]=<0:420002c00:1800> [L0 ZFS plain file] fletcher4 gzip-6 unencrypted LE contiguous unique single size=20000L/1800P birth=6964L/6964P fill=1 cksum=33c959ce924:9a8f32ee7f3ee:133103670a3682c5:b25f14f4adfbf874
           a0000   L0 DVA[0]=<0:420012e00:2400> [L0 ZFS plain file] fletcher4 gzip-6 unencrypted LE contiguous unique single size=20000L/2400P birth=6964L/6964P fill=1 cksum=4b02d996194:15718c7c45c6dd:3fe0ec38b9d454e8:f79b323b94bbba38

Notably:

Linux produced a slightly different sized result for 80000 (0x1a00P versus 0x1800)
FreeBSD and illumos agree on all the sizes and checksums, while Linux almost never agrees (I found very rare outputs later in the test file where they agreed)

I believe this is likely because Linux, way back in 2006 or so, did something like what I'm proposing in #12805 - they updated the zlib decompressor to 1.2.3, but left the compressor at 1.1.3 (because it performed marginally less well sometimes)...and haven't refreshed it from upstream since, while FreeBSD and illumos are both shipping 1.2.x. (I haven't yet tried swapping out OpenZFS on Linux's calls to call into a 1.2.x implementation, but that's what I'd like to try next.) (The alternate explanation is just that they made some subtly inconsistent changes at the time or since, but I find that less probable.)

Describe how to reproduce the problem

Write some compressible data with gzip on Linux and FreeBSD
Compare the results with zdb
Tada

Include any warning/errors/backtraces from the system logs

N/A

sempervictus commented 2 years ago

12840 should probably have mention of this.

Starting to think that we need to have all of that consistent, modular, and internal to zfs on all OS along with crypto and any other bits impacting what is written (vs the io pipelines for how it's written).

sempervictus commented 2 years ago

@rincebrain: would you be able to confirm that the same block of 0xdeadbeef repeating or whatnot appears as non-duplicate in the DDTs when written from BSD and then Linux (since their sums shouldn't match)?

rincebrain commented 2 years ago

I could, but why check? If the checksums don't match, it's not going to go into the DDT as a duplicate. (If it does, a lot more things are quite broken...)

sempervictus commented 2 years ago

Agreed, reason for asking is to have a record of commercially marketed features being broken such that the powers that be turn their focus to compression and encryption architecture (vs fixing this instance of a bigger problem).

AttilaFueloep commented 2 years ago

@sempervictus I've seen you mentioning encryption in this context a couple of times. If I understand the problem correctly encryption isn't affected by this problem. For a given plaintext any implementation of AES will produce exactly the same ciphertext and same holds for the MACs produced by the modes (GCM or CCM).

rincebrain commented 2 years ago

@sempervictus I've seen you mentioning encryption in this context a couple of times. If I understand the problem correctly encryption isn't affected by this problem. For a given plaintext any implementation of AES will produce exactly the same ciphertext and same holds for the MACs produced by the modes (GCM or CCM).

Encryption doesn't produce inconsistent results on disk across platforms, no, it just sometimes panics your system or mangles your encryption key on recv, if you're unlucky. NBD.

AttilaFueloep commented 2 years ago

@rincebrain First reaction :-), but really :-( :-(. Yeah, there's still a bit work left regarding encryption. I've never seen panics but I could reproduce a couple of key corruption issues with send/recv. I've a couple of comments and partial ideas how to fix it, but TBD (sorry for the pun). Well this gets off topic...

rincebrain commented 2 years ago

I've got a nice patch that just adds a usleep in dbuf_read and then it's really easy to reproduce the panic...

...fixing it is less obvious to me, unfortunately.

AttilaFueloep commented 2 years ago

Interesting, could you post the patch? Just curious, not deep in that code either.

sempervictus commented 2 years ago

@AttilaFueloep: concur, I mention crypto only because it is another low level dmu operation needing guards/checks x-platform to ensure safety.

rincebrain commented 2 years ago

Interesting, could you post the patch? Just curious, not deep in that code either.

https://github.com/rincebrain/zfs/tree/randdelay is the tree I was using; I found zfs_arc_wild_delay=20 or 50 or so on most modern x86 would let you reproduce the NULL dereference by running zfs_receive_raw in a loop a few dozen times, compared to ~never on a lot of modern hardware. My thoughts on the problem it illustrates are written down here; I found trying to just mutex the problem away fruitless because it appears dbuf_dest gets called on the dbuf anyway at which point everything is extremely NULL. Still adding hooks to find out who's responsible for that, since I think I expect all those callers to check refcounts first...

(It's also not guaranteed to always be the same panic, since the problem is "in the middle of a bunch of code using a number of buffers, something sets them all to NULL")

AttilaFueloep commented 2 years ago

@sempervictus All right, I see.

@rincebrain Thanks for the explanation and pointers. I'll have a look when time permits.

adamdmoss commented 2 years ago

This issue is a really interesting find, and I certainly enjoy any evidence that non-deterministic-compression is not a disaster in the wild.

Since the Linux kernel also has lz4 and zstd (and probably gzip) implementations of its own, I wonder if we should consider defering to those on the Linux side too. I think the downsides would be annoying - like, we'd have to internalize our own implementation anyway in case the kernel is missing it, so it's not like it particularly reduces maintenance burden, just multiplies the testing area... (makes me wonder what the rationale is for doing this on FreeBSD, is that articulated somewhere?)

There's just one vague upside to deferring to the in-kernel implementations: The in-kernel compressors can use SSE2/AVX (zstd has some SSE2 optimizations) with little overhead (the overhead of getting SSE2 robust for ZFS' internalized implementations may utterly negate any benefits). My gut feeling is it's not worth it.

gmelikov commented 2 years ago

Oh my, I've somehow thought that ZoL used it's own gzip implementation, but it's really not.

Besides - it's an interesting question if QAT gzip acceleration will produce same results.

Link to my thoughts on a topic https://github.com/openzfs/zfs/issues/12840#issuecomment-991867595

rincebrain commented 2 years ago

If memory serves from when I went digging before opening this, there was a time when ZFS carried its own implementation, but that got ejected a Long time ago. (I would not be surprised if it yielded a third set of results.)

IvanVolosyuk commented 2 years ago

Besides - it's an interesting question if QAT gzip acceleration will produce same results.

Looks like QAT will produce different results, see #7896

stale[bot] commented 1 year ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

openzfs / zfs

Linux and FreeBSD's gzip compression results disagree #12919

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

12840 should probably have mention of this.