google / ml-compiler-opt

Infrastructure for Machine Learning Guided Optimization (MLGO) in LLVM.
Apache License 2.0
629 stars 93 forks source link

Small Demo #303

Open reedkotler opened 1 year ago

reedkotler commented 1 year ago

What would be really helpful is a small test case that can train in 30 minutes on a modest machine. It does not have to produce a useful model but is something that one can see end to end without days of training. I'm willing to help make the test case if I can get some help.

mtrofin commented 1 year ago

Yup, and we could even set it up in the CI as a nightly. Do you have a project in mind to play "corpus" - if not, llvm itself could be it (if it wouldn't make things more confusing by playing 2 roles)

reedkotler commented 1 year ago

I have an unusual situation where I have full time for the next six weeks to do this.

I can make this demo if you can help walk me through what I need to do.

What I can see is that we just need to compile a bunch of code with some special options and then give the byte code or whatever to your algorithms.

It should not need compiling all of chrome to at least demonstrate that.

Except for this last glitch I am having with EIGEN, I am able to build both compiler versions and use your tflite script.

So I am close to being able to do things with your prebuilt models and then I just need to make a simple example for the rest works.

I think getting tangled up in Fuschia and Chrome is not necessarily helpful if you can help walk me through what I need to do.


From: Mircea Trofin @.> Sent: Monday, October 2, 2023 8:11 AM To: google/ml-compiler-opt @.> Cc: Reed Kotler @.>; Author @.> Subject: Re: [google/ml-compiler-opt] Small Demo (Issue #303)

Yup, and we could even set it up in the CI as a nightly. Do you have a project in mind to play "corpus" - if not, llvm itself could be it (if it wouldn't make things more confusing by playing 2 roles)

— Reply to this email directly, view it on GitHubhttps://github.com/google/ml-compiler-opt/issues/303#issuecomment-1743202818, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAFO4LKFOC2PTPK4KPO27ULX5LKQPAVCNFSM6AAAAAA5OIAZYKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBTGIYDEOBRHA. You are receiving this because you authored the thread.Message ID: @.***>

mtrofin commented 1 year ago

Right, so we could use llvm itself as the corpus donor. Sticking to inlining for size - because regalloc would need profiles (and I think for end to end demo-ing, inline for size is a fine example). These are the steps:

reedkotler commented 1 year ago

thanks. i'll try these steps out.


From: Mircea Trofin @.> Sent: Monday, October 2, 2023 9:43 AM To: google/ml-compiler-opt @.> Cc: Reed Kotler @.>; Author @.> Subject: Re: [google/ml-compiler-opt] Small Demo (Issue #303)

Right, so we could use llvm itself as the corpus donor. Sticking to inlining for size - because regalloc would need profiles (and I think for end to end demo-ing, inline for size is a fine example). These are the steps:

— Reply to this email directly, view it on GitHubhttps://github.com/google/ml-compiler-opt/issues/303#issuecomment-1743358687, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAFO4LLJ5GE4AN5UWBFWTZLX5LVJRAVCNFSM6AAAAAA5OIAZYKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBTGM2TQNRYG4. You are receiving this because you authored the thread.Message ID: @.***>

pshung commented 1 year ago

What training performance could I expect if only using LLVM as the training corpus?

mtrofin commented 1 year ago

What do you mean by "training performance": time it takes to train a model? Or model effectiveness (i.e. how much that model can shrink binaries)? Either way, I think @reedkotler did this recently (llvm as corpus), perhaps he can comment on both.

Fuchsia's case discussed in the demo used to be half a day, but when we doubled the feature count, so did the training time. IIRC they get ~3% shrinkage in their overall image.

pshung commented 12 months ago

Thanks for your instruction about using LLVM as a training corpus. I was able to run the inlining training. However, LLVM corpus includes about 2080 modules only. so, I wonder if the size reduction and the generalization ability against the performance figure mentioned in the paper (It claims 28000 IR modules to reach that performance).

mtrofin commented 12 months ago

In general, we observed that having more modules from more diverse projects, during training, would help a model generalize better, but just like with manual heuristics, without trying it out, it's hard to tell what to expect for a specific case.

ioana-ghiban-arm commented 1 month ago

Hello!

What would be the steps for 'Deploying and using the new policy' when using LLVM itself as the corpus donor? I have hopefully trained the optimized model from the warmstart model, but in $OUTPUT_DIR I only see policy dir, no saved_policy, not sure if that can be the reason for which I can't build the release. I configured it this way: cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang" -DLLVM_INLINER_MODEL_PATH=$OUTPUT_DIR/policy -DTENSORFLOW_AOT_PATH=${TENSORFLOW_AOT_PATH} $LLVM_SRCDIR/llvm It seems there is no model in policy. Another deviation from the demo instructions in my setup is that I hardcoded $TENSORFLOW_AOT_PATH to be able to generate the ninja.

Any suggestions would be much appreciated. Thanks.

mtrofin commented 1 month ago

[...] but in $OUTPUT_DIR I only see policy dir, no saved_policy [...]

let's focus on this first. How long did train_locally.py take? (should be a good nr of hours); if no log is visible, can you add --alsologtostderr and see what it dumps - it should report it compiled some non-zero modules at each step.

(My suspicion is that there may be an issue that makes each compile step fail => no actual training => no model)

ioana-ghiban-arm commented 2 weeks ago

Indeed, train_locally.py didn't run for long enough. The time it usually takes to create saved_policy seems to be around 10 hrs. However, I see the same performance (no binary size change) with this final saved_policy as I did with an intermediary one from some folder in policy. As mentioned here, in the case of using clang for both training and corpus collection, I've only noticed a difference in the size of the llvm-config binary, marginally larger when created by clang built with -DLLVM_INLINER_MODEL_PATH=/path/to/model/saved_policy. Do you have any suggestions for tweaking my setup? I could share the steps I've taken but that might get too verbose for an issue comment. I've basically filled in the steps I was missing from the directions above with what seemed most sensible in either the inlining or regalloc demos.

mtrofin commented 2 weeks ago

OK, that's weird. Let's first make sure the use side of things - i.e. how the model is ingested and used - is set up right. Then we can look at the training side.

I tried the published size model, here are my exact steps:

For brevity, I used my paths - I have a git repo for llvm under /work/llvm-project and I have a python 3.10 env set up under /work/python3.10, so those paths need replacing. The toolchain used to bootstrap clang shouldn't matter.

  1. Build the compiler we'll use to then build other binaries
cd /tmp
wget https://github.com/google/ml-compiler-opt/releases/download/inlining-Oz-v1.1/inlining-Oz-99f0063-v1.1.tar.gz
tar xvfz inlining-Oz-99f0063-v1.1.tar.gz
ls /tmp/model
cd /work/llvm-project
git checkout main && git pull && git checkout 665457815f11118f7e755a471f33606c8562a4be
mkdir build && cd build
cmake -GNinja -DCMAKE_BUILD_TYPE=Release ../llvm  -DLLVM_ENABLE_PROJECTS=clang  -DTENSORFLOW_AOT_PATH=/work/python3.10/lib/python3.10/site-packages/tensorflow -DLLVM_INLINER_MODEL_PATH=/tmp/model
ninja clang
  1. Build the baseline
    cd ../ && mkdir build-base && cd build-base
    cmake -GNinja -DCMAKE_BUILD_TYPE=MinSizeRel ../llvm  -DLLVM_ENABLE_PROJECTS=clang  -DTENSORFLOW_AOT_PATH=/work/python3.10/lib/python3.10/site-packages/tensorflow -DLLVM_INLINER_MODEL_PATH=/tmp/model -DCMAKE_C_COMPILER=/work/llvm-project/build/bin/clang -DCMAKE_CXX_COMPILER=/work/llvm-project/build/bin/clang++ -DCMAKE_EXPORT_COMPILE_COMMANDS=On

Check compile-commands.json to make sure it's using the previously built clang, then ninja clang (well, just ninja I guess - but I only built clang for validation)

  1. Build the experiment
    cd ../ && mkdir build-exp && cd build-exp
    cmake -GNinja -DCMAKE_BUILD_TYPE=MinSizeRel ../llvm  -DLLVM_ENABLE_PROJECTS=clang  -DTENSORFLOW_AOT_PATH=/work/python3.10/lib/python3.10/site-packages/tensorflow -DLLVM_INLINER_MODEL_PATH=/tmp/model -DCMAKE_C_COMPILER=/work/llvm-project/build/bin/clang -DCMAKE_CXX_COMPILER=/work/llvm-project/build/bin/clang++ -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DCMAKE_C_FLAGS="-mllvm -enable-ml-inliner=release" -DCMAKE_CXX_FLAGS="-mllvm -enable-ml-inliner=release"

I'd check again compile-commands.json to make sure it's using the previously built clang and that the flags are right (i.e. we compile with -mllvm -enable-ml-inliner=release). ninja clang again.

# mtrofin@mtrofin.c.googlers.com in /work/llvm-project on git:main o [8:05:40]
$ ls -l build-base/bin/clang-20
-rwxr-x--- 1 mtrofin primarygroup 179609320 Oct  9 07:57 build-base/bin/clang-20

$ ls -l build-exp/bin/clang-20
-rwxr-x--- 1 mtrofin primarygroup 158697880 Oct  9 08:05 build-exp/bin/clang-20

[8:07:54]

so that's about 12% size savings.

A possible gotcha: are you passing -mllvm -enable-ml-inliner=release? Are you building -Os or -Oz? (this latter is a weaker problem, the former is critical)