guzba / zippy

Pure Nim implementation of deflate, zlib, gzip and zip.
MIT License
246 stars 29 forks source link

-d:lto + non orc/arc mm + some machines fail uncompressing #45

Closed guzba closed 2 years ago

guzba commented 2 years ago

https://github.com/zedeus/nitter/commit/d407051b66c3a7d0ed6a9aed16ad1a844af44398

ghost commented 2 years ago

Hi, we've discussed this issue in Nim Telegram chat and it really seems like zippy is doing undefined behaviour by disregarding strict aliasing/type punning rules in C. See e.g. https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule

You can reproduce that zedeus' issue on latest GCC (version 12) with LTO for example (I can easily repro it with GCC + LTO on Arch in WSL). It does work with Clang, but that doesn't mean it's correct.

The proper fix would be to use Nim's copyMem (since it's essentially the same as memcpy) for all the read/write functions, with this change everything works fine:

proc read16*(src: ptr UncheckedArray[uint8], ip: int): uint16 {.inline.} =
  copyMem(result.addr, src[ip].unsafeAddr, 2)

proc read32*(src: ptr UncheckedArray[uint8], ip: int): uint32 {.inline.} =
  copyMem(result.addr, src[ip].unsafeAddr, 4)

proc read64*(src: ptr UncheckedArray[uint8], ip: int): uint64 {.inline.} =
  copyMem(result.addr, src[ip].unsafeAddr, 8)

proc write64*(dst: ptr UncheckedArray[uint8], op: int, v: uint64) {.inline.} =
  copyMem(dst[op].unsafeAddr, v.addr, 8)

proc copy64*(dst, src: ptr UncheckedArray[uint8], op, ip: int) {.inline.} =
  write64(dst, op, read64(src, ip))

proc read16*(s: string, pos: int): uint16 {.inline.} =
  copyMem(result.addr, s[pos].unsafeAddr, 2)

proc read32*(s: string, pos: int): uint32 {.inline.} =
  copyMem(result.addr, s[pos].unsafeAddr, 4)

The reason as to why I used copyMem is because memcpy (which copyMem is a Nim equivalent of) is the universal allowed way to do what we want here in copy/write functions without any UB in both C/C++ no matter what standard version. Also, I'm not sure how this will impact performance, but I think that it's really important to avoid any possible hard-to-debug UB behaviour, and especially considering that GCC's LTO (which can speed up code a lot) breaks currently, so all the speedup available with LTO will be missed.

ghost commented 2 years ago

And yeah, you can just use copyMem instead :)

ghost commented 2 years ago

FYI you can replicate the exact same issue by trying to unpack tests/data/tarballs/Nim-1.6.2.tar.gz from the zippy test cases :)

guzba commented 2 years ago

I need to be able to reproduce this issue before I do anything. I currently cannot reproduce this locally or with GitHub actions. It sounds like you are able to reproduce the issue with Arch in WSL? I'll give that a try.

ghost commented 2 years ago

@guzba you can try it on a native Arch installation, with Arch in WSL, or maybe on some other distro that has GCC 12.

ghost commented 2 years ago

The GitHub Actions you're using (Ubuntu 20.04) has GCC 9.4 and 10.3, not 12.0 - https://github.com/actions/virtual-environments/blob/ubuntu20/20220529.1/images/linux/Ubuntu2004-Readme.md

guzba commented 2 years ago

I'm using ubuntu-latest. If "latest" doesn't have this, is GCC 12.0 like beta or new or something? Oh it literally just came out.

ghost commented 2 years ago

@guzba "ubuntu-latest" in GitHub Actions is actually Ubuntu 20.04 - https://github.com/guzba/zippy/runs/6784731132?check_suite_focus=true#step:1:4 . I'm not sure if Ubuntu 22.04 has GCC 12 or not, but yes, GCC 12 is new but it's already released. I haven't tested, it might be the case that GCC 11 also has this bug.

guzba commented 2 years ago

It may, I will need to check. I haven't tested newer GCC though so from my perspective it is unsupported until I do.

I tried ArchWSL and first run after install it failed with some glibc error so I'm not going that route. I'm just going to set up an actual VM for testing.

guzba commented 2 years ago

Hm, turns out I can't use choosenim on Ubuntu 22.04 where I have GCC 11: https://github.com/dom96/choosenim/issues/297 Fortunately building Nim 1.6.6 from source worked without any trouble. Nimble doesn't work though could not import: SSL_get_peer_certificate. I'll confirm this a second time and then open an issue over there.

I am able to reproduce this LTO issue with GCC 11 on Ubuntu 22.04. Easy reproducibility of the issue is huge so this is great.

guzba commented 2 years ago

I have also confirmed that using copyMem fixes the issue now and am happy to merge it in now that I can confirm it. If GCC needs this to be happy, so be it.

guzba commented 2 years ago

Ok I've confirmed this on my test VM along with all the usual tests and GitHub Actions. I've tagged a release including this fix https://github.com/guzba/zippy/releases/tag/0.9.11

Thanks for the fix Yardanico. I was vaguely aware of the strict aliasing rules and I'm somewhat disturbed I need to pull out the sledge hammer that is copyMem to just read a few bytes but if GCC needs that to not make some optimizations, it's a small price to pay.

ghost commented 2 years ago

@guzba nice, thanks for the fix! I wonder if you can re-run the benchmarks - I'm curious if the performance of copyMem is worse or not.

guzba commented 2 years ago

I ran benchmarks after this change yesterday on both my PC (where I get the readme.md numbers from) and also on my Mac. Both showed no change in performance so the compiler is definitely turning these small, fixed size copyMems into something faster but also that works with LTO. This having no performance cost made it an easy fix to incorporate.

I don't have a physical machine for Linux right now so I don't have any benchmarks for GCC >= 11 with and without LTO, which I am curious about.