Open xuc-X opened 2 years ago
We refactored the --num_modules
flag to be set in the gin config for each specific problem due it being a pretty critical value for reproducibility in the regalloc case. It looks like I forgot to update the documentation in regards to that. You can just omit the flag. I'd recommend not setting the --num_workers
flag unless you have a compelling case to do so. It sets a completely different parameter than what --num_modules
used to modify. In regards to the specific error that you're seeing, it seems like the script isn't able to pick up the BC model. Did you perform the behavioral cloning step? And if so, what files are present in the directory mentioned by the gin binding flag setting that variable?
BC model is the LLVM bytecode model ? I use the train_bc.py successfully. The problem is in the train_locally.py. The command information show the model load successfully. This is the full log info:
**_performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
W0925 04:03:23.532205 140184883676992 ppo_agent.py:342] Only tf.keras.optimizers.Optimiers are well supported, got a non-TF2 optimizer: <tensorflow.python.training.adam.AdamOptimizer object at 0x7f7e9dd49460>
I0925 04:03:24.762801 140184883676992 common.py:1009] No checkpoint available at /code/model
I0925 04:03:26.191171 140184883676992 train_locally.py:101] Loading module specs from corpus at /code/corpus.
I0925 04:03:30.300293 140184883676992 train_locally.py:107] Done loading module specs from corpus.
I0925 04:03:30.300908 140184883676992 train_locally.py:133] Loaded Reward Stat Map from disk, containing 0 modules
I0925 04:03:30.514247 140184883676992 train_locally.py:152] Last iteration took: 0.004603
W0925 04:03:32.547599 140184883676992 save.py:271] Found untraced functions such as ActorDistributionNetwork_layer_call_fn, ActorDistributionNetwork_layer_call_and_return_conditional_losses, ConstantValueNetwork_layer_call_fn, ConstantValueNetwork_layer_call_and_return_conditional_losses, EncodingNetwork_layer_call_fn while saving (showing 5 of 92). These functions will not be directly callable after loading.
/root/.local/lib/python3.8/site-packages/tensorflow/python/saved_model/nested_structure_coder.py:521: UserWarning: Encoding a StructuredValue with type tfp.distributions.Deterministic_ACTTypeSpec; loading this StructuredValue will require that this type be imported and registered.
warnings.warn("Encoding a StructuredValue with type %s; loading this "
INFO:tensorflow:Assets written to: /code/model/policy/0/saved_policy/assets
I0925 04:03:33.073540 140184883676992 builder_impl.py:779] Assets written to: /code/model/policy/0/saved_policy/assets
2022-09-25 04:03:34.994831: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:362] Ignored output_format.
2022-09-25 04:03:34.994904: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:365] Ignored drop_control_dependency.
2022-09-25 04:03:34.995828: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /code/model/policy/0/saved_policy
2022-09-25 04:03:35.000722: I tensorflow/cc/saved_model/reader.cc:89] Reading meta graph with tags { serve }
2022-09-25 04:03:35.000781: I tensorflow/cc/saved_model/reader.cc:130] Reading SavedModel debug info (if present) from: /code/model/policy/0/saved_policy
2022-09-25 04:03:35.017182: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:365] MLIR V1 optimization pass is not enabled
2022-09-25 04:03:35.023192: I tensorflow/cc/saved_model/loader.cc:229] Restoring SavedModel bundle.
2022-09-25 04:03:35.092413: I tensorflow/cc/saved_model/loader.cc:213] Running initialization op on SavedModel bundle at path: /code/model/policy/0/saved_policy
2022-09-25 04:03:35.147566: I tensorflow/cc/saved_model/loader.cc:305] SavedModel load for tags { serve }; Status: success: OK. Took 151744 microseconds.
2022-09-25 04:03:35.242257: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var MLIR_CRASH_REPRODUCER_DIRECTORY
to enable.
2022-09-25 04:03:35.444218: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2078] Estimated count of arithmetic ops: 0.011 M ops, equivalently 0.005 M MACs
W0925 04:03:37.566624 140184883676992 save.py:271] Found untraced functions such as ActorDistributionNetwork_layer_call_fn, ActorDistributionNetwork_layer_call_and_return_conditional_losses, ConstantValueNetwork_layer_call_fn, ConstantValueNetwork_layer_call_and_return_conditional_losses, EncodingNetwork_layer_call_fn while saving (showing 5 of 92). These functions will not be directly callable after loading.
/root/.local/lib/python3.8/site-packages/tensorflow/python/saved_model/nested_structure_coder.py:521: UserWarning: Encoding a StructuredValue with type tfp.distributions.Categorical_ACTTypeSpec; loading this StructuredValue will require that this type be imported and registered.
warnings.warn("Encoding a StructuredValue with type %s; loading this "
INFO:tensorflow:Assets written to: /code/model/policy/0/saved_collect_policy/assets
I0925 04:03:38.054838 140184883676992 builder_impl.py:779] Assets written to: /code/model/policy/0/saved_collect_policy/assets
2022-09-25 04:03:40.066622: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:362] Ignored output_format.
2022-09-25 04:03:40.066686: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:365] Ignored drop_control_dependency.
2022-09-25 04:03:40.066882: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /code/model/policy/0/saved_collect_policy
2022-09-25 04:03:40.071930: I tensorflow/cc/saved_model/reader.cc:89] Reading meta graph with tags { serve }
2022-09-25 04:03:40.071989: I tensorflow/cc/saved_model/reader.cc:130] Reading SavedModel debug info (if present) from: /code/model/policy/0/saved_collect_policy
2022-09-25 04:03:40.093924: I tensorflow/cc/saved_model/loader.cc:229] Restoring SavedModel bundle.
2022-09-25 04:03:40.173268: I tensorflow/cc/saved_model/loader.cc:213] Running initialization op on SavedModel bundle at path: /code/model/policy/0/saved_collect_policy
2022-09-25 04:03:40.228462: I tensorflow/cc/saved_model/loader.cc:305] SavedModel load for tags { serve }; Status: success: OK. Took 161578 microseconds.
2022-09-25 04:03:40.557391: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2078] Estimated count of arithmetic ops: 0.011 M ops, equivalently 0.005 M MACs
I0925 04:03:40.805665 140184883676992 local_data_collector.py:78] Waiting for pending work from last iteration took 0.000003
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpgjfr19pm/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
3 errors generated.
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpw8v7dxdu/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
3 errors generated.
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmp_qg7173y/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmps93tsj2r/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpndhn0nu2/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
3 errors generated.
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpne14xdzf/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmp5doda41v/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
3 errors generated.
3 errors generated.
3 errors generated.
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpupt7jlc5/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpl5mm4t7i/policyCould not find TFOutput named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.**
So I try print the command line in inline_runner.py by following code:
_try: command_line = [] if self._launcher_path: command_line.append(self._launcher_path) command_line.extend([self._clang_path] + list(module_spec.exec_cmd) + [ '-mllvm', '-enable-ml-inliner=development', '-mllvm', '-training-log=' + log_path, '-o', output_native_path ]) if tf_policy_path: command_line.extend( ['-mllvm', '-ml-inliner-model-under-training=' + tf_policy_path]) print("command_line1\n",command_line) compilation_runner.start_cancellable_process(command_line, self._compilation_timeout, self._cancellation_manager) command_line = [self._llvm_size_path, output_native_path] print("command_line2\n",command_line)_
I use the output command like this:
_'/code/llvm-install/bin/clang' '-cc1' '-triple' 'x86_64-unknown-fuchsia' '-emit-obj' '-massembler-fatal-warnings' '--mrelax-relocations' '-disable-free' '-clear-ast-before-backend' '-disable-llvm-verifier' '-discard-value-names' '-main-file-name' 'block-device-manager.cc' '-mrelocation-model' 'pic' '-pic-level' '2' '-pic-is-pie' '-mframe-pointer=all' '-ffp-contract=off' '-fno-rounding-math' '-mconstructor-aliases' '-funwind-tables=2' '-target-cpu' 'x86-64-v2' '-mllvm' '-x86-branches-within-32B-boundaries' '-tune-cpu' 'generic' '-mllvm' '-treat-scalable-fixed-error-as-warning' '-debug-info-kind=constructor' '-dwarf-version=5' '-debugger-tuning=gdb' '-mllvm' '-crash-diagnostics-dir=clang-crashreports' '-ffunction-sections' '-fdata-sections' '-fcoverage-compilation-dir=.' '-resource-dir' '../../../llvm-install/lib/clang/15.0.1' '-dependency-file' 'obj/src/storage/fshost/block-watcher.block-device-manager.cc.o.d' '-MT' 'obj/src/storage/fshost/block-watcher.block-device-manager.cc.o' '-sys-header-deps' '-D' '_LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS' '-D' '_LIBCPP_REMOVE_TRANSITIVE_INCLUDES' '-D' '_LIBCPP_ENABLE_THREAD_SAFETY_ANNOTATIONS=1' '-D' 'ZX_ASSERT_LEVEL=2' '-D' '_ALL_SOURCE' '-D' 'FIDL_TRACE_LEVEL=0' '-I' '../..' '-I' 'gen' '-I' 'obj' '-I' '../../sdk' '-I' 'gen/sdk' '-I' 'fidling/gen/sdk/fidl/fuchsia.inspect/fuchsia.inspect/hlcpp' '-I' '../../sdk/lib/fidl_base/include' '-I' 'gen/include' '-I' '../../src/zircon/lib/zircon/include' '-I' 'fidling/gen/sdk/fidl/fuchsia.mem/fuchsia.mem/hlcpp' '-I' '../../sdk/lib/fit/include' '-I' '../../sdk/lib/stdcompat/include' '-I' '../../sdk/lib/fit-promise/include' '-I' '../../sdk/lib/fidl/include' '-I' '../../zircon/system/ulib/zx/include' '-I' '../../zircon/system/ulib/async/include' '-I' '../../zircon/system/ulib/async-default/include' '-I' '../../zircon/system/ulib/inspect/include' '-I' 'fidling/gen/sdk/fidl/fuchsia.io/fuchsia.io/hlcpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.unknown/fuchsia.unknown/hlcpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.sys/fuchsia.sys/hlcpp' '-I' '../../sdk/lib/fdio/include' '-I' 'fidling/gen/sdk/fidl/fuchsia.boot/fuchsia.boot/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.io/fuchsia.io/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.unknown/fuchsia.unknown/cpp' '-I' '../../sdk/lib/fidl/cpp/wire/include' '-I' '../../zircon/system/ulib/zxc/include' '-I' '../../zircon/system/ulib/sync/include' '-I' '../../zircon/system/ulib/fbl/include' '-I' '../../zircon/system/ulib/fzl/include' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.block.volume/fuchsia.hardware.block.volume/c' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.block/fuchsia.hardware.block/c' '-I' 'fidling/gen/sdk/fidl/fuchsia.io/fuchsia.io/c' '-I' 'fidling/gen/sdk/fidl/fuchsia.unknown/fuchsia.unknown/c' '-I' 'fidling/gen/zircon/vdso/zx/zx/c' '-I' 'fidling/gen/sdk/fidl/fuchsia.storage.metrics/fuchsia.storage.metrics/c' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.block.partition/fuchsia.hardware.block.partition/c' '-I' 'fidling/gen/sdk/fidl/fuchsia.device/fuchsia.device/c' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.block.volume/fuchsia.hardware.block.volume/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.block/fuchsia.hardware.block/cpp' '-I' '../../src/lib/fidl/cpp/include' '-I' 'x64-shared/gen/sdk' '-I' 'fidling/gen/sdk/fidl/fuchsia.storage.metrics/fuchsia.storage.metrics/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.block.partition/fuchsia.hardware.block.partition/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.device/fuchsia.device/cpp' '-I' 'fidling/gen/src/storage/fidl/fuchsia.fs.startup/fuchsia.fs.startup/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.fs/fuchsia.fs/cpp' '-I' '../../zircon/system/ulib/fidl-async/include' '-I' '../../zircon/system/ulib/trace/include' '-I' '../../zircon/system/ulib/trace-engine/include' '-I' 'fidling/gen/sdk/fidl/fuchsia.feedback/fuchsia.feedback/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.math/fuchsia.math/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.mem/fuchsia.mem/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.fshost/fuchsia.fshost/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.process.lifecycle/fuchsia.process.lifecycle/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.ldsvc/fuchsia.ldsvc/cpp' '-I' 'fidling/gen/src/storage/fxfs/fuchsia.fxfs/cpp' '-I' '../../zircon/system/ulib/async-loop/include' '-I' '../../zircon/system/ulib/fdio-caller/include' '-I' '../../zircon/system/ulib/service/include' '-I' 'fidling/gen/sdk/fidl/fuchsia.fs/fuchsia.fs/hlcpp' '-I' '../../zircon/system/public' '-I' '../../zircon/system/ulib/storage/buffer/include' '-I' '../../zircon/system/ulib/storage/operation/include' '-I' '../../src/lib/storage/block_client/cpp/include' '-I' '../../zircon/system/ulib/range/include' '-I' '../../zircon/system/ulib/storage-metrics/include' '-I' '../../src/storage/lib/disk_inspector/include' '-I' '../../src/storage/lib/watchdog/include' '-I' '../../zircon/system/ulib/syslog/include' '-I' '../../zircon/system/ulib/bitmap/include' '-I' '../../zircon/system/ulib/id_allocator/include' '-I' '../../zircon/third_party/ulib/safemath/include' '-I' 'fidling/gen/src/storage/blobfs/fuchsia.blobfs.internal/hlcpp' '-I' 'fidling/gen/src/storage/blobfs/fuchsia.blobfs.internal/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.blobfs/fuchsia.blobfs/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.device.manager/fuchsia.device.manager/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.driver.framework/fuchsia.driver.framework/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.component/fuchsia.component/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.component.decl/fuchsia.component.decl/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.data/fuchsia.data/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.url/fuchsia.url/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.process/fuchsia.process/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.component.runner/fuchsia.component.runner/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.diagnostics.types/fuchsia.diagnostics.types/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.driver.host/fuchsia.driver.host/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.power.statecontrol/fuchsia.hardware.power.statecontrol/cpp' '-I' 'fidling/gen/src/sys/pkg/fidl/fuchsia.update.verify/fuchsia.update.verify/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.block.encrypted/fuchsia.hardware.block.encrypted/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.block.verified/fuchsia.hardware.block.verified/cpp' '-I' '../../src/lib/storage/ramdevice_client/cpp/include' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.nand/fuchsia.hardware.nand/c' '-I' '../../src/storage/gpt/include' '-I' '../../zircon/system/ulib/zircon-internal/include' '-I' '../../zircon/system/ulib/explicit-memory/include' '-D' 'FIDL_ALLOW_DEPRECATED_C_BINDINGS' '-D' 'FIDL_ALLOW_DEPRECATED_C_BINDINGS' '-isysroot' 'gen/zircon/public/sysroot/cpp' '-internal-isystem' '../../../llvm-install/bin/../include/x86_64-unknown-fuchsia/c++/v1' '-internal-isystem' '../../../llvm-install/bin/../include/c++/v1' '-internal-isystem' '../../../llvm-install/lib/clang/15.0.1/include' '-internal-externc-isystem' 'gen/zircon/public/sysroot/cpp/include' '-Os' '-ffuchsia-api-level=4294967295' '-std=c++17' '-fdeprecated-macro' '-fdebug-compilation-dir=.' '-ferror-limit' '19' '-fvisibility' 'hidden' '-fvisibility-inlines-hidden' '-fsanitize=safe-stack' '-stack-protector' '2' '-ftrivial-auto-var-init=pattern' '-fno-rtti' '-fgnuc-version=4.2.1' '-fcolor-diagnostics' '-vectorize-loops' '-vectorize-slp' '-fembed-bitcode=all' '-debug-info-kind=constructor' '-faddrsig' '-D' '__GCC_HAVE_DWARF2_CFIASM=1' '' '-x' 'ir' '/code/corpus/obj/src/storage/fshost/block-watcher.block-device-manager.cc.o.bc' '-mllvm' '-enable-ml-inliner=development' '-mllvm' '-training-log=/tmp/tmp6dd7o0lh/log' '-o' '/tmp/test.aa'
I get the error: fatal error: error in backend: IO failure on output stream: Bad file descriptor
But I delete the '-mllvm' '-enable-ml-inliner=development' '-mllvm' '-training-log=/tmp/tmp6dd7o0lh/log' '-o' '/tmp/test.aa' and the command run successful.
I use the LLVM15, this is commit ID.
commit b73d2c8c720a8c8e6e73b11be4e27afa6cb75bdf (HEAD -> release/15.x, tag: llvmorg-15.0.1, origin/release/15.x) Author: Florian Hahn flo@fhahn.com Date: Mon Sep 19 18:14:34 2022 +0100
[LV] Keep track of cost-based ScalarAfterVec in VPWidenPointerInd.
Epilogue vectorization uses isScalarAfterVectorization to check if
widened versions for inductions need to be generated and bails out in
those cases.
At the moment, there are scenarios where isScalarAfterVectorization
returns true but VPWidenPointerInduction::onlyScalarsGenerated would
return false, causing widening.
This can lead to widened phis with incorrect start values being created
in the epilogue vector body.
This patch addresses the issue by storing the cost-model decision in
VPWidenPointerInductionRecipe and restoring the behavior before 151c144.
This effectively reverts 151c144, but the long-term fix is to properly
support widened inductions during epilogue vectorization
Fixes #57712.
The reason you get Bad file descriptor
when trying to debug is that /tmp/tmp6dd7o0lh/log
doesn't exist (more specifically, the first part of the path, i.e. /tmp/tmp6dd7o0lh
- it's a tempfile
- created (from Python) directory. Try pointing -training-log
to output somewhere else, like /tmp/this_is_the.log
, i.e. under an existing dir.
Now for the first part. That seems to be about how the model passed to clang during training is invalid. I'm assuming you're at or near HEAD
of this (ml-compiler-opt) repo. Under your $OUTPUT_DIR
, do you see a bunch of saved model directories? You should see a policy
dir, under which you should see a bunch of numbered dirs. Pick one of the latter, under it you should see a saved_policy
and saved_collect_policy
. What do you see under it?
Yes, you are right, For Bad file descriptor problem, I use the /tmp/test.log and the command run successfully. This is my $OUTPUT_DIR This is my policy dir. How I debug the python or C++ code to test train model ?
What happens if you use the same command line that works, and add -mllvm -ml-inliner-model-under-training=/code/model/policy/0/saved_collect_policy
It show Status: success !
2022-09-26 16:50:22.840280: I tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /code/model/policy/0/saved_collect_policy 2022-09-26 16:50:22.847266: I tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve } 2022-09-26 16:50:22.860368: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2022-09-26 16:50:22.870020: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3700070000 Hz 2022-09-26 16:50:22.872757: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x555d0f8a0370 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2022-09-26 16:50:22.872793: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2022-09-26 16:50:22.906265: I tensorflow/cc/saved_model/loader.cc:202] Restoring SavedModel bundle. 2022-09-26 16:50:22.957064: I tensorflow/cc/saved_model/loader.cc:151] Running initialization op on SavedModel bundle at path: /code/model/policy/0/saved_collect_policy 2022-09-26 16:50:22.996176: I tensorflow/cc/saved_model/loader.cc:311] SavedModel load for tags { serve }; Status: success. Took 155897 microseconds.
OK, and it compiles I assume. Hmm. Ah, I see what happens. You're building the compiler with the tensorflow C APIs, not tflite, right? We haven't updated the documentation yet - but here's how to build with tflite:
1) make a directory somewhere, e.g. /tmp/tflitebuild && cd /tmp/tflitebuild
2) run buildbot/build_tflite.sh
... (this takes a bit - it git clone
s a bunch of repos and builds them)
3) notice a /tmp/tflitebuild/tflite.cmake
was created
4) for your cmake (best to wipe out the build dir and re-issue cmake): instead of passing -DTENSORFLOW_C_LIB_PATH
, pass -C /tmp/tflitebuild/tflite.cmake
.
That's it!
Updated now the demo - @boomanaiden154 had a PR open (#131 ) for a while and we forgot to merge. Sorry.
Thanks, I try it, I recompile my LLVM project and fushcia project. but the problem still happen. Maybe something I can check or debug in the code? I have no idea.
Command is:
rm -rf $OUTPUT_DIR && \ PYTHONPATH=$PYTHONPATH:. python3 \ compiler_opt/rl/train_locally.py \ --root_dir=$OUTPUT_DIR \ --data_path=$CORPUS \ --gin_bindings=clang_path="'$LLVM_INSTALLDIR/bin/clang'" \ --gin_bindings=llvm_size_path="'$LLVM_INSTALLDIR/bin/llvm-size'" \ --gin_files=compiler_opt/rl/inlining/gin_configs/ppo_nn_agent.gin \ --gin_bindings=train_eval.warmstart_policy_dir=\"$WARMSTART_OUTPUT_DIR/saved_policy\"
Log is:
train_eval.agent_name = %compiler_opt.rl.constant.AgentName.PPO train_eval.batch_size = 256 train_eval.deploy_policy_name = 'saved_collect_policy' train_eval.moving_average_decay_rate = 0.8 train_eval.num_iterations = 300 train_eval.num_modules = 100 train_eval.num_policy_iterations = 3000 train_eval.train_sequence_length = 16 train_eval.use_random_network_distillation = False train_eval.warmstart_policy_dir = '/code/warmstart/saved_policy'
2022-09-26 17:53:44.895495: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-26 17:53:45.034631: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-26 17:53:45.034828: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-26 17:53:45.035015: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-26 17:53:45.035185: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-26 17:53:45.035344: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-26 17:53:45.035499: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-26 17:53:45.625412: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-26 17:53:45.625633: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-26 17:53:45.625834: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-26 17:53:45.626000: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-26 17:53:45.626171: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-26 17:53:45.626332: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10082 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:24:00.0, compute capability: 8.6
2022-09-26 17:53:45.626555: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-26 17:53:45.626697: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 10188 MB memory: -> device: 1, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:2d:00.0, compute capability: 8.6
2022-09-26 17:53:46.251521: I tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:629] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
W0926 17:53:46.293635 140348014933824 ppo_agent.py:342] Only tf.keras.optimizers.Optimiers are well supported, got a non-TF2 optimizer: <tensorflow.python.training.adam.AdamOptimizer object at 0x7fa49a380970>
I0926 17:53:46.903522 140348014933824 common.py:1009] No checkpoint available at /code/model
I0926 17:53:47.646316 140348014933824 train_locally.py:101] Loading module specs from corpus at /code/corpus.
I0926 17:53:51.522883 140348014933824 train_locally.py:107] Done loading module specs from corpus.
I0926 17:53:52.110074 140348014933824 local_data_collector.py:73] prefetching took 0
I0926 17:53:52.122872 140348014933824 train_locally.py:152] Last iteration took: 0.012367
W0926 17:53:53.189572 140348014933824 save.py:271] Found untraced functions such as ActorDistributionNetwork_layer_call_fn, ActorDistributionNetwork_layer_call_and_return_conditional_losses, ConstantValueNetwork_layer_call_fn, ConstantValueNetwork_layer_call_and_return_conditional_losses, EncodingNetwork_layer_call_fn while saving (showing 5 of 92). These functions will not be directly callable after loading.
/root/.local/lib/python3.8/site-packages/tensorflow/python/saved_model/nested_structure_coder.py:521: UserWarning: Encoding a StructuredValue with type tfp.distributions.Deterministic_ACTTypeSpec; loading this StructuredValue will require that this type be imported and registered.
warnings.warn("Encoding a StructuredValue with type %s; loading this "
INFO:tensorflow:Assets written to: /code/model/policy/0/saved_policy/assets
I0926 17:53:53.458599 140348014933824 builder_impl.py:779] Assets written to: /code/model/policy/0/saved_policy/assets
2022-09-26 17:53:54.306021: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:362] Ignored output_format.
2022-09-26 17:53:54.306056: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:365] Ignored drop_control_dependency.
2022-09-26 17:53:54.306634: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /code/model/policy/0/saved_policy
2022-09-26 17:53:54.308837: I tensorflow/cc/saved_model/reader.cc:89] Reading meta graph with tags { serve }
2022-09-26 17:53:54.308854: I tensorflow/cc/saved_model/reader.cc:130] Reading SavedModel debug info (if present) from: /code/model/policy/0/saved_policy
2022-09-26 17:53:54.314542: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:365] MLIR V1 optimization pass is not enabled
2022-09-26 17:53:54.315878: I tensorflow/cc/saved_model/loader.cc:229] Restoring SavedModel bundle.
2022-09-26 17:53:54.345653: I tensorflow/cc/saved_model/loader.cc:213] Running initialization op on SavedModel bundle at path: /code/model/policy/0/saved_policy
2022-09-26 17:53:54.365923: I tensorflow/cc/saved_model/loader.cc:305] SavedModel load for tags { serve }; Status: success: OK. Took 59290 microseconds.
2022-09-26 17:53:54.404422: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var MLIR_CRASH_REPRODUCER_DIRECTORY
to enable.
2022-09-26 17:53:54.513431: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2078] Estimated count of arithmetic ops: 0.011 M ops, equivalently 0.005 M MACs
W0926 17:53:55.616633 140348014933824 save.py:271] Found untraced functions such as ActorDistributionNetwork_layer_call_fn, ActorDistributionNetwork_layer_call_and_return_conditional_losses, ConstantValueNetwork_layer_call_fn, ConstantValueNetwork_layer_call_and_return_conditional_losses, EncodingNetwork_layer_call_fn while saving (showing 5 of 92). These functions will not be directly callable after loading.
/root/.local/lib/python3.8/site-packages/tensorflow/python/saved_model/nested_structure_coder.py:521: UserWarning: Encoding a StructuredValue with type tfp.distributions.Categorical_ACTTypeSpec; loading this StructuredValue will require that this type be imported and registered.
warnings.warn("Encoding a StructuredValue with type %s; loading this "
INFO:tensorflow:Assets written to: /code/model/policy/0/saved_collect_policy/assets
I0926 17:53:55.860256 140348014933824 builder_impl.py:779] Assets written to: /code/model/policy/0/saved_collect_policy/assets
2022-09-26 17:53:56.730252: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:362] Ignored output_format.
2022-09-26 17:53:56.730288: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:365] Ignored drop_control_dependency.
2022-09-26 17:53:56.730407: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /code/model/policy/0/saved_collect_policy
2022-09-26 17:53:56.732630: I tensorflow/cc/saved_model/reader.cc:89] Reading meta graph with tags { serve }
2022-09-26 17:53:56.732646: I tensorflow/cc/saved_model/reader.cc:130] Reading SavedModel debug info (if present) from: /code/model/policy/0/saved_collect_policy
2022-09-26 17:53:56.737559: I tensorflow/cc/saved_model/loader.cc:229] Restoring SavedModel bundle.
2022-09-26 17:53:56.766411: I tensorflow/cc/saved_model/loader.cc:213] Running initialization op on SavedModel bundle at path: /code/model/policy/0/saved_collect_policy
2022-09-26 17:53:56.786771: I tensorflow/cc/saved_model/loader.cc:305] SavedModel load for tags { serve }; Status: success: OK. Took 56365 microseconds.
2022-09-26 17:53:56.948630: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2078] Estimated count of arithmetic ops: 0.011 M ops, equivalently 0.005 M MACs
I0926 17:53:57.091879 140348014933824 local_data_collector.py:134] resolving prefetched sample took: 0 seconds
I0926 17:53:57.092738 140348014933824 local_data_collector.py:73] prefetching took 0
I0926 17:53:57.092979 140348014933824 local_data_collector.py:91] Waiting for pending work from last iteration took 0.000001
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpbxfbj34s/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpx5fxbs2c/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpm2gq9m6x/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
3 errors generated.
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpa_m6lua9/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
3 errors generated.
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpiec123gk/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
3 errors generated.
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpjfas9ul9/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpx83gfm17/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
3 errors generated.
3 errors generated.
3 errors generated.
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpd6fx2uw3/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
3 errors generated.
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpqer841ol/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmp56v46hye/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
3 errors generated.
I am not delete my llvm-project build directory, just directly use the cmake command to recompile and use ninja to build. Maybe it will have problem. I delete it and try again.
You may need to delete the build directory and then re-create it and re-issue the correct (new) cmake command. After that, and after rebuilding clang, try out the one clang invocation we tried in isolation (the one that included the path to the training model)
OK,Thanks!!
Hello, I'm running this command
_rm -rf $OUTPUT_DIR && \ PYTHONPATH=$PYTHONPATH:. python3 \ compiler_opt/rl/train_locally.py \ --root_dir=$OUTPUT_DIR \ --data_path=$CORPUS \ --gin_bindings=clang_path="'$LLVM_INSTALLDIR/bin/clang'" \ --gin_bindings=llvm_size_path="'$LLVM_INSTALLDIR/bin/llvm-size'" \ --num_modules=100 \ --gin_files=compiler_opt/rl/inlining/gin_configs/ppo_nn_agent.gin \ --gin_bindings=train_eval.warmstart_policy_dir=\"$WARMSTART_OUTPUT_DIR/savedpolicy\"
script tell me --num_modules can't not use, I change the command --num_workers=100. But I get the following errors :
2022-09-24 07:12:57.902522: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2078] Estimated count of arithmetic ops: 0.011 M ops, equivalently 0.005 M MACs I0924 07:12:58.107042 139987454576448 local_data_collector.py:78] Waiting for pending work from last iteration took 0.000004 Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpeb9wk1gz/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator error: Could not load or create model evaluator. error: Could not setup Inlining Advisor for the requested mode and/or options 3 errors generated. Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpsc32ijpx/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator error: Could not load or create model evaluator. error: Could not setup Inlining Advisor for the requested mode and/or options 3 errors generated. Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmp0xz05pcf/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator error: Could not load or create model evaluator. error: Could not setup Inlining Advisor for the requested mode and/or options 3 errors generated.
Do you have any idea?