Open reedkotler opened 1 year ago
Yup, and we could even set it up in the CI as a nightly. Do you have a project in mind to play "corpus" - if not, llvm itself could be it (if it wouldn't make things more confusing by playing 2 roles)
I have an unusual situation where I have full time for the next six weeks to do this.
I can make this demo if you can help walk me through what I need to do.
What I can see is that we just need to compile a bunch of code with some special options and then give the byte code or whatever to your algorithms.
It should not need compiling all of chrome to at least demonstrate that.
Except for this last glitch I am having with EIGEN, I am able to build both compiler versions and use your tflite script.
So I am close to being able to do things with your prebuilt models and then I just need to make a simple example for the rest works.
I think getting tangled up in Fuschia and Chrome is not necessarily helpful if you can help walk me through what I need to do.
From: Mircea Trofin @.> Sent: Monday, October 2, 2023 8:11 AM To: google/ml-compiler-opt @.> Cc: Reed Kotler @.>; Author @.> Subject: Re: [google/ml-compiler-opt] Small Demo (Issue #303)
Yup, and we could even set it up in the CI as a nightly. Do you have a project in mind to play "corpus" - if not, llvm itself could be it (if it wouldn't make things more confusing by playing 2 roles)
— Reply to this email directly, view it on GitHubhttps://github.com/google/ml-compiler-opt/issues/303#issuecomment-1743202818, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAFO4LKFOC2PTPK4KPO27ULX5LKQPAVCNFSM6AAAAAA5OIAZYKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBTGIYDEOBRHA. You are receiving this because you authored the thread.Message ID: @.***>
Right, so we could use llvm itself as the corpus donor. Sticking to inlining for size - because regalloc would need profiles (and I think for end to end demo-ing, inline for size is a fine example). These are the steps:
/work/llvm-project
, cd /work/llvm-project
, mkdir tflite-build && cd tflite-build
. cmake will need to additionally have -DLLVM_ENABLE_PROJECTS=clang
. The goal of this step is to build the clang we'll use for training, but we'll also use this clang for corpus collection. ninja clang llvm-objcopy
(we need objcopy to extract the corpus)cd .. && mkdir corpus-build && cd corpus-build
cmake -GNinja -DCMAKE_BUILD_TYPE=MinSizeRel -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DCMAKE_CXX_COMPILER=/work/llvm-project/build/bin/clang++ -DCMAKE_C_COMPILER=/work/llvm-project/build/bin/clang -DCMAKE_CXX_FLAGS="-Xclang=-fembed-bitcode=all" -DCMAKE_C_FLAGS="-Xclang=-fembed-bitcode=all" ../llvm
. We don't bother trying to say we build for size, the goal is just to get to a corpus. Note this generates a compile_commands.json
in that build dir.ninja opt llc
-> this is so we have some objects built.cd /work/ml-compiler-opt
then PYTHONPATH=$PYTHONPATH:. python3 compiler_opt/tools/extract_ir.py --input /work/llvm-project/corpus-build/compile_commands.json --input_type json --llvm_objcopy_path /work/llvm-project/build/bin/llvm-objcopy --output_dir /tmp/corpus
/tmp/corpus
, after that the training steps are the same as in the demos - i.e. collect a default trace.. all that.thanks. i'll try these steps out.
From: Mircea Trofin @.> Sent: Monday, October 2, 2023 9:43 AM To: google/ml-compiler-opt @.> Cc: Reed Kotler @.>; Author @.> Subject: Re: [google/ml-compiler-opt] Small Demo (Issue #303)
Right, so we could use llvm itself as the corpus donor. Sticking to inlining for size - because regalloc would need profiles (and I think for end to end demo-ing, inline for size is a fine example). These are the steps:
— Reply to this email directly, view it on GitHubhttps://github.com/google/ml-compiler-opt/issues/303#issuecomment-1743358687, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAFO4LLJ5GE4AN5UWBFWTZLX5LVJRAVCNFSM6AAAAAA5OIAZYKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBTGM2TQNRYG4. You are receiving this because you authored the thread.Message ID: @.***>
What training performance could I expect if only using LLVM as the training corpus?
What do you mean by "training performance": time it takes to train a model? Or model effectiveness (i.e. how much that model can shrink binaries)? Either way, I think @reedkotler did this recently (llvm as corpus), perhaps he can comment on both.
Fuchsia's case discussed in the demo used to be half a day, but when we doubled the feature count, so did the training time. IIRC they get ~3% shrinkage in their overall image.
Thanks for your instruction about using LLVM as a training corpus. I was able to run the inlining training. However, LLVM corpus includes about 2080 modules only. so, I wonder if the size reduction and the generalization ability against the performance figure mentioned in the paper (It claims 28000 IR modules to reach that performance).
In general, we observed that having more modules from more diverse projects, during training, would help a model generalize better, but just like with manual heuristics, without trying it out, it's hard to tell what to expect for a specific case.
Hello!
What would be the steps for 'Deploying and using the new policy' when using LLVM itself as the corpus donor? I have hopefully trained the optimized model from the warmstart model, but in $OUTPUT_DIR I only see policy
dir, no saved_policy
, not sure if that can be the reason for which I can't build the release. I configured it this way:
cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang" -DLLVM_INLINER_MODEL_PATH=$OUTPUT_DIR/policy -DTENSORFLOW_AOT_PATH=${TENSORFLOW_AOT_PATH} $LLVM_SRCDIR/llvm
It seems there is no model in policy
.
Another deviation from the demo instructions in my setup is that I hardcoded $TENSORFLOW_AOT_PATH to be able to generate the ninja.
Any suggestions would be much appreciated. Thanks.
[...] but in $OUTPUT_DIR I only see
policy
dir, nosaved_policy
[...]
let's focus on this first. How long did train_locally.py
take? (should be a good nr of hours); if no log is visible, can you add --alsologtostderr
and see what it dumps - it should report it compiled some non-zero modules at each step.
(My suspicion is that there may be an issue that makes each compile step fail => no actual training => no model)
Indeed, train_locally.py
didn't run for long enough. The time it usually takes to create saved_policy
seems to be around 10 hrs. However, I see the same performance (no binary size change) with this final saved_policy
as I did with an intermediary one from some folder in policy
. As mentioned here, in the case of using clang for both training and corpus collection, I've only noticed a difference in the size of the llvm-config
binary, marginally larger when created by clang built with -DLLVM_INLINER_MODEL_PATH=/path/to/model/saved_policy
.
Do you have any suggestions for tweaking my setup? I could share the steps I've taken but that might get too verbose for an issue comment. I've basically filled in the steps I was missing from the directions above with what seemed most sensible in either the inlining or regalloc demos.
OK, that's weird. Let's first make sure the use side of things - i.e. how the model is ingested and used - is set up right. Then we can look at the training side.
I tried the published size model, here are my exact steps:
For brevity, I used my paths - I have a git repo for llvm under /work/llvm-project
and I have a python 3.10 env set up under /work/python3.10
, so those paths need replacing. The toolchain used to bootstrap clang shouldn't matter.
cd /tmp
wget https://github.com/google/ml-compiler-opt/releases/download/inlining-Oz-v1.1/inlining-Oz-99f0063-v1.1.tar.gz
tar xvfz inlining-Oz-99f0063-v1.1.tar.gz
ls /tmp/model
cd /work/llvm-project
git checkout main && git pull && git checkout 665457815f11118f7e755a471f33606c8562a4be
mkdir build && cd build
cmake -GNinja -DCMAKE_BUILD_TYPE=Release ../llvm -DLLVM_ENABLE_PROJECTS=clang -DTENSORFLOW_AOT_PATH=/work/python3.10/lib/python3.10/site-packages/tensorflow -DLLVM_INLINER_MODEL_PATH=/tmp/model
ninja clang
cd ../ && mkdir build-base && cd build-base
cmake -GNinja -DCMAKE_BUILD_TYPE=MinSizeRel ../llvm -DLLVM_ENABLE_PROJECTS=clang -DTENSORFLOW_AOT_PATH=/work/python3.10/lib/python3.10/site-packages/tensorflow -DLLVM_INLINER_MODEL_PATH=/tmp/model -DCMAKE_C_COMPILER=/work/llvm-project/build/bin/clang -DCMAKE_CXX_COMPILER=/work/llvm-project/build/bin/clang++ -DCMAKE_EXPORT_COMPILE_COMMANDS=On
Check compile-commands.json
to make sure it's using the previously built clang, then ninja clang
(well, just ninja
I guess - but I only built clang for validation)
cd ../ && mkdir build-exp && cd build-exp
cmake -GNinja -DCMAKE_BUILD_TYPE=MinSizeRel ../llvm -DLLVM_ENABLE_PROJECTS=clang -DTENSORFLOW_AOT_PATH=/work/python3.10/lib/python3.10/site-packages/tensorflow -DLLVM_INLINER_MODEL_PATH=/tmp/model -DCMAKE_C_COMPILER=/work/llvm-project/build/bin/clang -DCMAKE_CXX_COMPILER=/work/llvm-project/build/bin/clang++ -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DCMAKE_C_FLAGS="-mllvm -enable-ml-inliner=release" -DCMAKE_CXX_FLAGS="-mllvm -enable-ml-inliner=release"
I'd check again compile-commands.json
to make sure it's using the previously built clang and that the flags are right (i.e. we compile with -mllvm -enable-ml-inliner=release
). ninja clang
again.
# mtrofin@mtrofin.c.googlers.com in /work/llvm-project on git:main o [8:05:40]
$ ls -l build-base/bin/clang-20
-rwxr-x--- 1 mtrofin primarygroup 179609320 Oct 9 07:57 build-base/bin/clang-20
$ ls -l build-exp/bin/clang-20
-rwxr-x--- 1 mtrofin primarygroup 158697880 Oct 9 08:05 build-exp/bin/clang-20
[8:07:54]
so that's about 12% size savings.
A possible gotcha: are you passing -mllvm -enable-ml-inliner=release? Are you building -Os or -Oz? (this latter is a weaker problem, the former is critical)
What would be really helpful is a small test case that can train in 30 minutes on a modest machine. It does not have to produce a useful model but is something that one can see end to end without days of training. I'm willing to help make the test case if I can get some help.