mimblewimble / grin

Minimal implementation of the Mimblewimble protocol.
https://grin.mw/
Apache License 2.0
5.04k stars 990 forks source link

Defork croaring #3596

Closed trevyn closed 3 years ago

trevyn commented 3 years ago

Replace mimblewimble/croaring-rs / croaring-mw fork with upstream while ensuring no new AVX/AVX2 instructions are added in release builds.

Bumps the effective version of croaring-rs from 0.4.4 to 0.4.6. (details)

Unblocks:

Motivation: We want to update croaring-rs to unblock the above issues, and at the same time, the upstream merge of https://github.com/saulius/croaring-rs/pull/62 makes our fork unnecessary. See comments in #3545 for comments on why we originally forked.

Verification

cargo test --all passes locally.

Verification of no new AVX/AVX2 instructions, using Intel XED from http://www.intel.com/software/xed:

git clone https://github.com/intelxed/xed.git
cd xed
./mfile.py examples install
cp kits/xed-install-base-2021-03-17-mac-x86-64/bin/xed /usr/local/bin/xed
$ cargo clean && cargo build --release
$ /usr/local/bin/xed -i target/release/grin > native-disasm
$ grep -c SSE2 native-disasm
22479
$ grep -c AVX native-disasm
4380
$ grep -c AVX2 native-disasm
1192

$ cargo clean && ROARING_ARCH=nehalem cargo build --release
$ /usr/local/bin/xed -i target/release/grin > nehalem-disasm
$ grep -c SSE2 nehalem-disasm 
22794
$ grep -c AVX nehalem-disasm
2505
$ grep -c AVX2 nehalem-disasm
570

$ cargo clean && ROARING_ARCH=x86-64-v2 cargo build --release
$ /usr/local/bin/xed -i target/release/grin > x86-64-v2-disasm
$ grep -c SSE2 x86-64-v2-disasm 
23235
$ grep -c AVX x86-64-v2-disasm
2505
$ grep -c AVX2 x86-64-v2-disasm
570

$ wget https://github.com/mimblewimble/grin/releases/download/v5.0.1/grin-v5.0.1-macos.tar.gz 
$ tar xzf grin-v5.0.1-macos.tar.gz
$ /usr/local/bin/xed -i grin/grin > release-disasm
$ grep -c SSE2 release-disasm
22908
$ grep -c AVX release-disasm 
2505
$ grep -c AVX2 release-disasm
570

I'm not sure where these other AVX instructions are coming from if the existing release is tested to function as we expect. It's possible they're behind runtime checks.

trevyn commented 3 years ago

Hmm, Windows can't find libclang now:

thread 'main' panicked at 'Unable to find libclang: "thelibclangshared library at C:\\msys64\\mingw64\\bin\\libclang.dll could not be opened: LoadLibraryExW failed"', C:\Users\VssAdministrator\.cargo\registry\src\github.com-1ecc6299db9ec823\bindgen-0.56.0\src/lib.rs:1922:31

I noticed clang-sys was updated in the Cargo.lock:

  name = "clang-sys"
- version = "0.28.1"
+ version = "1.1.1"

I'll look into this later.

trevyn commented 3 years ago
env:
  LIBCLANG_PATH: ${{ runner.temp }}/llvm-${{ matrix.clang[0] }}/lib
  LLVM_CONFIG_PATH: ${{ runner.temp }}/llvm-${{ matrix.clang[0] }}/bin/llvm-config

Thanks! I got GHA to work: https://github.com/trevyn/grin/runs/2138862178

- run: choco install -y llvm
- run: refreshenv && cargo test --all
  env:
    LIBCLANG_PATH: C:\Program Files\LLVM\lib
    LLVM_CONFIG_PATH: C:\Program Files\LLVM\bin\llvm-config

I'll push a commit in a moment and see if it works on Azure too.

And a PR against upstream, they disabled their Windows CI build instead😂

phyro commented 3 years ago

is croaring used for the bitmap mmr? what's the likelihood of this causing a possible consensus change?

trevyn commented 3 years ago

is croaring used for the bitmap mmr? what's the likelihood of this causing a possible consensus change?

My newbie analysis (please educate if any of this is off-base for the question):

Assuming we're not newly triggering any LLVM/clang codegen bugs, the only functional effect of this change should be to bump croaring-rs from the fork point at 0.4.4 (https://github.com/mimblewimble/croaring-rs/commit/9977f80cef9d9b52c0f5de30a24c0c366bf8561c) to 0.4.6 (https://github.com/saulius/croaring-rs/commit/4fc658248331cb0a8ee3f785f45cf39c45753c69) (diff 0.4.4...0.4.6).

These changes in croaring-rs look innocuous to me, and also bump the underlying lemire/CRoaringUnityBuild submodule from 0.2.65 to 0.2.66 (diff). Those changes appear to be a simple tweak to memory management plus this change to roaring.hh that I do not yet feel qualified to validate.

If there is any concern that consensus change could be caused by bumping croaring-rs or its dependency RoaringBitmap/CRoaring, we should definitely have a system in place to ensure that this bumping doesn't happen accidentally, or otherwise mitigate the concern. If this is desired, I can put some thought into how to implement that.

If anyone can think of any tests that would help confirm the validity of the change, I'd also be happy to take a whack at trying to write them.

trevyn commented 3 years ago

I found this in the clang docs: https://clang.llvm.org/docs/UsersManual.html#x86

Several micro-architecture levels as specified by the x86-64 psABI are defined. They are cumulative in the sense that features from previous levels are implicitly included in later levels.

  • -march=x86-64: CMOV, CMPXCHG8B, FPU, FXSR, MMX, FXSR, SCE, SSE, SSE2
  • -march=x86-64-v2: (close to Nehalem) CMPXCHG16B, LAHF-SAHF, POPCNT, SSE3, SSE4.1, SSE4.2, SSSE3
  • -march=x86-64-v3: (close to Haswell) AVX, AVX2, BMI1, BMI2, F16C, FMA, LZCNT, MOVBE, XSAVE
  • -march=x86-64-v4: AVX512F, AVX512BW, AVX512CD, AVX512DQ, AVX512VL

More background, apparently this is a new thing: https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-linux-9-for-the-x86-64-v2-microarchitecture-level/

So ROARING_ARCH=x86-64-v2 may be cleaner and closer to what we're looking for. I'll verlfy in a bit.

phyro commented 3 years ago

oh haha, I was just trying to make people confirm it is fine from the consensus point of view - I think we should get into practice of questioning such things even for updates like this one. I'm honestly not qualified enough to answer the question which is why I asked and I'm sorry if this made you dive too deep. It was no my intention, but nevertheless, thanks for doing it!

phyro commented 3 years ago

@quentinlesceller would be great if you could make a pass when you find the time so we get this in :+1: I'd also suggest we consider archiving the croaring fork in the mimblewimble organization after we confirm everything works well in the release containing these changes

antiochp commented 3 years ago

Testing this locally. Changes look reasonable to me.

antiochp commented 3 years ago

is croaring used for the bitmap mmr? what's the likelihood of this causing a possible consensus change?

We use it internally for storage and but we don't use it as part of the hashing implementation. Hashing a bitmap chunk happens over the internal bytes directly (of which a croaring bitmap is simply an alternative representation).

This is how we "write" the bytes as part of the hashing impl -

https://github.com/mimblewimble/grin/blob/302c8ec92869c8280c069a7be3ec61e9fc013037/chain/src/txhashset/bitmap_accumulator.rs#L234-L238

antiochp commented 3 years ago

We are seeing some cases of "Illegal Instruction" for 5.1.0 and this is a potential culprit?

antiochp commented 3 years ago

Is it possible the scripts are not correctly setting env vars?

https://docs.microsoft.com/en-us/azure/devops/pipelines/process/variables?view=azure-devops&tabs=yaml%2Cbatch#access-variables-through-the-environment

It looks like the recommended way is to do something along the lines of -

variables:
    ROARING_ARCH: x86-64-v2 
steps:
  - script: |
      cargo clean
      cargo build --release

Rather than what we are doing within the script itself -

steps:
  - script: |
      cargo clean
      ROARING_ARCH=x86-64-v2
      cargo build --release

There is also the option to use bash instead of script in the pipeline config which I suspect may make things more robust (or at least better understood) -

https://docs.microsoft.com/en-us/azure/devops/pipelines/scripts/cross-platform-scripting?view=azure-devops&tabs=yaml#consider-bash-or-pwsh

I'm not actually sure what the implementation is for a simple script in the pipeline.