xuc-X commented 2 years ago

Hello, I'm running this command

_rm -rf $OUTPUT_DIR && \ PYTHONPATH=$PYTHONPATH:. python3 \ compiler_opt/rl/train_locally.py \ --root_dir=$OUTPUT_DIR \ --data_path=$CORPUS \ --gin_bindings=clang_path="'$LLVM_INSTALLDIR/bin/clang'" \ --gin_bindings=llvm_size_path="'$LLVM_INSTALLDIR/bin/llvm-size'" \ --num_modules=100 \ --gin_files=compiler_opt/rl/inlining/gin_configs/ppo_nn_agent.gin \ --gin_bindings=train_eval.warmstart_policy_dir=\"$WARMSTART_OUTPUT_DIR/savedpolicy\"

script tell me --num_modules can't not use, I change the command --num_workers=100. But I get the following errors :

2022-09-24 07:12:57.902522: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2078] Estimated count of arithmetic ops: 0.011 M ops, equivalently 0.005 M MACs I0924 07:12:58.107042 139987454576448 local_data_collector.py:78] Waiting for pending work from last iteration took 0.000004 Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpeb9wk1gz/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator error: Could not load or create model evaluator. error: Could not setup Inlining Advisor for the requested mode and/or options 3 errors generated. Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpsc32ijpx/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator error: Could not load or create model evaluator. error: Could not setup Inlining Advisor for the requested mode and/or options 3 errors generated. Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmp0xz05pcf/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator error: Could not load or create model evaluator. error: Could not setup Inlining Advisor for the requested mode and/or options 3 errors generated.

Do you have any idea?

boomanaiden154 commented 2 years ago

We refactored the --num_modules flag to be set in the gin config for each specific problem due it being a pretty critical value for reproducibility in the regalloc case. It looks like I forgot to update the documentation in regards to that. You can just omit the flag. I'd recommend not setting the --num_workers flag unless you have a compelling case to do so. It sets a completely different parameter than what --num_modules used to modify. In regards to the specific error that you're seeing, it seems like the script isn't able to pick up the BC model. Did you perform the behavioral cloning step? And if so, what files are present in the directory mentioned by the gin binding flag setting that variable?

xc303919323 commented 2 years ago

BC model is the LLVM bytecode model ? I use the train_bc.py successfully. The problem is in the train_locally.py. The command information show the model load successfully. This is the full log info:

**_performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. W0925 04:03:23.532205 140184883676992 ppo_agent.py:342] Only tf.keras.optimizers.Optimiers are well supported, got a non-TF2 optimizer: <tensorflow.python.training.adam.AdamOptimizer object at 0x7f7e9dd49460> I0925 04:03:24.762801 140184883676992 common.py:1009] No checkpoint available at /code/model I0925 04:03:26.191171 140184883676992 train_locally.py:101] Loading module specs from corpus at /code/corpus. I0925 04:03:30.300293 140184883676992 train_locally.py:107] Done loading module specs from corpus. I0925 04:03:30.300908 140184883676992 train_locally.py:133] Loaded Reward Stat Map from disk, containing 0 modules I0925 04:03:30.514247 140184883676992 train_locally.py:152] Last iteration took: 0.004603 W0925 04:03:32.547599 140184883676992 save.py:271] Found untraced functions such as ActorDistributionNetwork_layer_call_fn, ActorDistributionNetwork_layer_call_and_return_conditional_losses, ConstantValueNetwork_layer_call_fn, ConstantValueNetwork_layer_call_and_return_conditional_losses, EncodingNetwork_layer_call_fn while saving (showing 5 of 92). These functions will not be directly callable after loading. /root/.local/lib/python3.8/site-packages/tensorflow/python/saved_model/nested_structure_coder.py:521: UserWarning: Encoding a StructuredValue with type tfp.distributions.Deterministic_ACTTypeSpec; loading this StructuredValue will require that this type be imported and registered. warnings.warn("Encoding a StructuredValue with type %s; loading this " INFO:tensorflow:Assets written to: /code/model/policy/0/saved_policy/assets I0925 04:03:33.073540 140184883676992 builder_impl.py:779] Assets written to: /code/model/policy/0/saved_policy/assets 2022-09-25 04:03:34.994831: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:362] Ignored output_format. 2022-09-25 04:03:34.994904: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:365] Ignored drop_control_dependency. 2022-09-25 04:03:34.995828: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /code/model/policy/0/saved_policy 2022-09-25 04:03:35.000722: I tensorflow/cc/saved_model/reader.cc:89] Reading meta graph with tags { serve } 2022-09-25 04:03:35.000781: I tensorflow/cc/saved_model/reader.cc:130] Reading SavedModel debug info (if present) from: /code/model/policy/0/saved_policy 2022-09-25 04:03:35.017182: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:365] MLIR V1 optimization pass is not enabled 2022-09-25 04:03:35.023192: I tensorflow/cc/saved_model/loader.cc:229] Restoring SavedModel bundle. 2022-09-25 04:03:35.092413: I tensorflow/cc/saved_model/loader.cc:213] Running initialization op on SavedModel bundle at path: /code/model/policy/0/saved_policy 2022-09-25 04:03:35.147566: I tensorflow/cc/saved_model/loader.cc:305] SavedModel load for tags { serve }; Status: success: OK. Took 151744 microseconds. 2022-09-25 04:03:35.242257: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var MLIR_CRASH_REPRODUCER_DIRECTORY to enable. 2022-09-25 04:03:35.444218: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2078] Estimated count of arithmetic ops: 0.011 M ops, equivalently 0.005 M MACs W0925 04:03:37.566624 140184883676992 save.py:271] Found untraced functions such as ActorDistributionNetwork_layer_call_fn, ActorDistributionNetwork_layer_call_and_return_conditional_losses, ConstantValueNetwork_layer_call_fn, ConstantValueNetwork_layer_call_and_return_conditional_losses, EncodingNetwork_layer_call_fn while saving (showing 5 of 92). These functions will not be directly callable after loading. /root/.local/lib/python3.8/site-packages/tensorflow/python/saved_model/nested_structure_coder.py:521: UserWarning: Encoding a StructuredValue with type tfp.distributions.Categorical_ACTTypeSpec; loading this StructuredValue will require that this type be imported and registered. warnings.warn("Encoding a StructuredValue with type %s; loading this " INFO:tensorflow:Assets written to: /code/model/policy/0/saved_collect_policy/assets I0925 04:03:38.054838 140184883676992 builder_impl.py:779] Assets written to: /code/model/policy/0/saved_collect_policy/assets 2022-09-25 04:03:40.066622: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:362] Ignored output_format. 2022-09-25 04:03:40.066686: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:365] Ignored drop_control_dependency. 2022-09-25 04:03:40.066882: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /code/model/policy/0/saved_collect_policy 2022-09-25 04:03:40.071930: I tensorflow/cc/saved_model/reader.cc:89] Reading meta graph with tags { serve } 2022-09-25 04:03:40.071989: I tensorflow/cc/saved_model/reader.cc:130] Reading SavedModel debug info (if present) from: /code/model/policy/0/saved_collect_policy 2022-09-25 04:03:40.093924: I tensorflow/cc/saved_model/loader.cc:229] Restoring SavedModel bundle. 2022-09-25 04:03:40.173268: I tensorflow/cc/saved_model/loader.cc:213] Running initialization op on SavedModel bundle at path: /code/model/policy/0/saved_collect_policy 2022-09-25 04:03:40.228462: I tensorflow/cc/saved_model/loader.cc:305] SavedModel load for tags { serve }; Status: success: OK. Took 161578 microseconds. 2022-09-25 04:03:40.557391: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2078] Estimated count of arithmetic ops: 0.011 M ops, equivalently 0.005 M MACs I0925 04:03:40.805665 140184883676992 local_data_collector.py:78] Waiting for pending work from last iteration took 0.000003 Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpgjfr19pm/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator error: Could not load or create model evaluator. error: Could not setup Inlining Advisor for the requested mode and/or options 3 errors generated. Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpw8v7dxdu/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator error: Could not load or create model evaluator. error: Could not setup Inlining Advisor for the requested mode and/or options 3 errors generated. Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmp_qg7173y/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator error: Could not load or create model evaluator. error: Could not setup Inlining Advisor for the requested mode and/or options Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmps93tsj2r/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator error: Could not load or create model evaluator. error: Could not setup Inlining Advisor for the requested mode and/or options Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpndhn0nu2/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator error: Could not load or create model evaluator. error: Could not setup Inlining Advisor for the requested mode and/or options 3 errors generated. Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpne14xdzf/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator error: Could not load or create model evaluator. error: Could not setup Inlining Advisor for the requested mode and/or options Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmp5doda41v/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator error: Could not load or create model evaluator. error: Could not setup Inlining Advisor for the requested mode and/or options 3 errors generated. 3 errors generated. 3 errors generated. Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpupt7jlc5/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator error: Could not load or create model evaluator. error: Could not setup Inlining Advisor for the requested mode and/or options Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpl5mm4t7i/policyCould not find TFOutput named: StatefulPartitionedCallerror: Failed to create saved model evaluator error: Could not load or create model evaluator.**

So I try print the command line in inline_runner.py by following code:

_try: command_line = [] if self._launcher_path: command_line.append(self._launcher_path) command_line.extend([self._clang_path] + list(module_spec.exec_cmd) + [ '-mllvm', '-enable-ml-inliner=development', '-mllvm', '-training-log=' + log_path, '-o', output_native_path ]) if tf_policy_path: command_line.extend( ['-mllvm', '-ml-inliner-model-under-training=' + tf_policy_path]) print("command_line1\n",command_line) compilation_runner.start_cancellable_process(command_line, self._compilation_timeout, self._cancellation_manager) command_line = [self._llvm_size_path, output_native_path] print("command_line2\n",command_line)_

I use the output command like this:

_'/code/llvm-install/bin/clang' '-cc1' '-triple' 'x86_64-unknown-fuchsia' '-emit-obj' '-massembler-fatal-warnings' '--mrelax-relocations' '-disable-free' '-clear-ast-before-backend' '-disable-llvm-verifier' '-discard-value-names' '-main-file-name' 'block-device-manager.cc' '-mrelocation-model' 'pic' '-pic-level' '2' '-pic-is-pie' '-mframe-pointer=all' '-ffp-contract=off' '-fno-rounding-math' '-mconstructor-aliases' '-funwind-tables=2' '-target-cpu' 'x86-64-v2' '-mllvm' '-x86-branches-within-32B-boundaries' '-tune-cpu' 'generic' '-mllvm' '-treat-scalable-fixed-error-as-warning' '-debug-info-kind=constructor' '-dwarf-version=5' '-debugger-tuning=gdb' '-mllvm' '-crash-diagnostics-dir=clang-crashreports' '-ffunction-sections' '-fdata-sections' '-fcoverage-compilation-dir=.' '-resource-dir' '../../../llvm-install/lib/clang/15.0.1' '-dependency-file' 'obj/src/storage/fshost/block-watcher.block-device-manager.cc.o.d' '-MT' 'obj/src/storage/fshost/block-watcher.block-device-manager.cc.o' '-sys-header-deps' '-D' '_LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS' '-D' '_LIBCPP_REMOVE_TRANSITIVE_INCLUDES' '-D' '_LIBCPP_ENABLE_THREAD_SAFETY_ANNOTATIONS=1' '-D' 'ZX_ASSERT_LEVEL=2' '-D' '_ALL_SOURCE' '-D' 'FIDL_TRACE_LEVEL=0' '-I' '../..' '-I' 'gen' '-I' 'obj' '-I' '../../sdk' '-I' 'gen/sdk' '-I' 'fidling/gen/sdk/fidl/fuchsia.inspect/fuchsia.inspect/hlcpp' '-I' '../../sdk/lib/fidl_base/include' '-I' 'gen/include' '-I' '../../src/zircon/lib/zircon/include' '-I' 'fidling/gen/sdk/fidl/fuchsia.mem/fuchsia.mem/hlcpp' '-I' '../../sdk/lib/fit/include' '-I' '../../sdk/lib/stdcompat/include' '-I' '../../sdk/lib/fit-promise/include' '-I' '../../sdk/lib/fidl/include' '-I' '../../zircon/system/ulib/zx/include' '-I' '../../zircon/system/ulib/async/include' '-I' '../../zircon/system/ulib/async-default/include' '-I' '../../zircon/system/ulib/inspect/include' '-I' 'fidling/gen/sdk/fidl/fuchsia.io/fuchsia.io/hlcpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.unknown/fuchsia.unknown/hlcpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.sys/fuchsia.sys/hlcpp' '-I' '../../sdk/lib/fdio/include' '-I' 'fidling/gen/sdk/fidl/fuchsia.boot/fuchsia.boot/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.io/fuchsia.io/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.unknown/fuchsia.unknown/cpp' '-I' '../../sdk/lib/fidl/cpp/wire/include' '-I' '../../zircon/system/ulib/zxc/include' '-I' '../../zircon/system/ulib/sync/include' '-I' '../../zircon/system/ulib/fbl/include' '-I' '../../zircon/system/ulib/fzl/include' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.block.volume/fuchsia.hardware.block.volume/c' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.block/fuchsia.hardware.block/c' '-I' 'fidling/gen/sdk/fidl/fuchsia.io/fuchsia.io/c' '-I' 'fidling/gen/sdk/fidl/fuchsia.unknown/fuchsia.unknown/c' '-I' 'fidling/gen/zircon/vdso/zx/zx/c' '-I' 'fidling/gen/sdk/fidl/fuchsia.storage.metrics/fuchsia.storage.metrics/c' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.block.partition/fuchsia.hardware.block.partition/c' '-I' 'fidling/gen/sdk/fidl/fuchsia.device/fuchsia.device/c' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.block.volume/fuchsia.hardware.block.volume/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.block/fuchsia.hardware.block/cpp' '-I' '../../src/lib/fidl/cpp/include' '-I' 'x64-shared/gen/sdk' '-I' 'fidling/gen/sdk/fidl/fuchsia.storage.metrics/fuchsia.storage.metrics/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.block.partition/fuchsia.hardware.block.partition/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.device/fuchsia.device/cpp' '-I' 'fidling/gen/src/storage/fidl/fuchsia.fs.startup/fuchsia.fs.startup/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.fs/fuchsia.fs/cpp' '-I' '../../zircon/system/ulib/fidl-async/include' '-I' '../../zircon/system/ulib/trace/include' '-I' '../../zircon/system/ulib/trace-engine/include' '-I' 'fidling/gen/sdk/fidl/fuchsia.feedback/fuchsia.feedback/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.math/fuchsia.math/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.mem/fuchsia.mem/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.fshost/fuchsia.fshost/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.process.lifecycle/fuchsia.process.lifecycle/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.ldsvc/fuchsia.ldsvc/cpp' '-I' 'fidling/gen/src/storage/fxfs/fuchsia.fxfs/cpp' '-I' '../../zircon/system/ulib/async-loop/include' '-I' '../../zircon/system/ulib/fdio-caller/include' '-I' '../../zircon/system/ulib/service/include' '-I' 'fidling/gen/sdk/fidl/fuchsia.fs/fuchsia.fs/hlcpp' '-I' '../../zircon/system/public' '-I' '../../zircon/system/ulib/storage/buffer/include' '-I' '../../zircon/system/ulib/storage/operation/include' '-I' '../../src/lib/storage/block_client/cpp/include' '-I' '../../zircon/system/ulib/range/include' '-I' '../../zircon/system/ulib/storage-metrics/include' '-I' '../../src/storage/lib/disk_inspector/include' '-I' '../../src/storage/lib/watchdog/include' '-I' '../../zircon/system/ulib/syslog/include' '-I' '../../zircon/system/ulib/bitmap/include' '-I' '../../zircon/system/ulib/id_allocator/include' '-I' '../../zircon/third_party/ulib/safemath/include' '-I' 'fidling/gen/src/storage/blobfs/fuchsia.blobfs.internal/hlcpp' '-I' 'fidling/gen/src/storage/blobfs/fuchsia.blobfs.internal/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.blobfs/fuchsia.blobfs/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.device.manager/fuchsia.device.manager/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.driver.framework/fuchsia.driver.framework/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.component/fuchsia.component/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.component.decl/fuchsia.component.decl/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.data/fuchsia.data/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.url/fuchsia.url/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.process/fuchsia.process/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.component.runner/fuchsia.component.runner/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.diagnostics.types/fuchsia.diagnostics.types/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.driver.host/fuchsia.driver.host/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.power.statecontrol/fuchsia.hardware.power.statecontrol/cpp' '-I' 'fidling/gen/src/sys/pkg/fidl/fuchsia.update.verify/fuchsia.update.verify/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.block.encrypted/fuchsia.hardware.block.encrypted/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.block.verified/fuchsia.hardware.block.verified/cpp' '-I' '../../src/lib/storage/ramdevice_client/cpp/include' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.nand/fuchsia.hardware.nand/c' '-I' '../../src/storage/gpt/include' '-I' '../../zircon/system/ulib/zircon-internal/include' '-I' '../../zircon/system/ulib/explicit-memory/include' '-D' 'FIDL_ALLOW_DEPRECATED_C_BINDINGS' '-D' 'FIDL_ALLOW_DEPRECATED_C_BINDINGS' '-isysroot' 'gen/zircon/public/sysroot/cpp' '-internal-isystem' '../../../llvm-install/bin/../include/x86_64-unknown-fuchsia/c++/v1' '-internal-isystem' '../../../llvm-install/bin/../include/c++/v1' '-internal-isystem' '../../../llvm-install/lib/clang/15.0.1/include' '-internal-externc-isystem' 'gen/zircon/public/sysroot/cpp/include' '-Os' '-ffuchsia-api-level=4294967295' '-std=c++17' '-fdeprecated-macro' '-fdebug-compilation-dir=.' '-ferror-limit' '19' '-fvisibility' 'hidden' '-fvisibility-inlines-hidden' '-fsanitize=safe-stack' '-stack-protector' '2' '-ftrivial-auto-var-init=pattern' '-fno-rtti' '-fgnuc-version=4.2.1' '-fcolor-diagnostics' '-vectorize-loops' '-vectorize-slp' '-fembed-bitcode=all' '-debug-info-kind=constructor' '-faddrsig' '-D' '__GCC_HAVE_DWARF2_CFIASM=1' '' '-x' 'ir' '/code/corpus/obj/src/storage/fshost/block-watcher.block-device-manager.cc.o.bc' '-mllvm' '-enable-ml-inliner=development' '-mllvm' '-training-log=/tmp/tmp6dd7o0lh/log' '-o' '/tmp/test.aa'

I get the error: fatal error: error in backend: IO failure on output stream: Bad file descriptor

But I delete the '-mllvm' '-enable-ml-inliner=development' '-mllvm' '-training-log=/tmp/tmp6dd7o0lh/log' '-o' '/tmp/test.aa' and the command run successful.

I use the LLVM15, this is commit ID.

commit b73d2c8c720a8c8e6e73b11be4e27afa6cb75bdf (HEAD -> release/15.x, tag: llvmorg-15.0.1, origin/release/15.x) Author: Florian Hahn flo@fhahn.com Date: Mon Sep 19 18:14:34 2022 +0100

[LV] Keep track of cost-based ScalarAfterVec in VPWidenPointerInd.

Epilogue vectorization uses isScalarAfterVectorization to check if
widened versions for inductions need to be generated and bails out in
those cases.

At the moment, there are scenarios where isScalarAfterVectorization
returns true but VPWidenPointerInduction::onlyScalarsGenerated would
return false, causing widening.

This can lead to widened phis with incorrect start values being created
in the epilogue vector body.

This patch addresses the issue by storing the cost-model decision in
VPWidenPointerInductionRecipe and restoring the behavior before 151c144.
This effectively reverts 151c144, but the long-term fix is to properly
support widened inductions during epilogue vectorization

Fixes #57712.

mtrofin commented 2 years ago

The reason you get Bad file descriptor when trying to debug is that /tmp/tmp6dd7o0lh/log doesn't exist (more specifically, the first part of the path, i.e. /tmp/tmp6dd7o0lh - it's a tempfile - created (from Python) directory. Try pointing -training-log to output somewhere else, like /tmp/this_is_the.log, i.e. under an existing dir.

Now for the first part. That seems to be about how the model passed to clang during training is invalid. I'm assuming you're at or near HEAD of this (ml-compiler-opt) repo. Under your $OUTPUT_DIR, do you see a bunch of saved model directories? You should see a policy dir, under which you should see a bunch of numbered dirs. Pick one of the latter, under it you should see a saved_policy and saved_collect_policy. What do you see under it?

xuc-X commented 2 years ago

Yes, you are right, For Bad file descriptor problem, I use the /tmp/test.log and the command run successfully. This is my $OUTPUT_DIR This is my policy dir. How I debug the python or C++ code to test train model ?

mtrofin commented 2 years ago

What happens if you use the same command line that works, and add -mllvm -ml-inliner-model-under-training=/code/model/policy/0/saved_collect_policy

xuc-X commented 2 years ago

It show Status: success ！

2022-09-26 16:50:22.840280: I tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /code/model/policy/0/saved_collect_policy 2022-09-26 16:50:22.847266: I tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve } 2022-09-26 16:50:22.860368: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2022-09-26 16:50:22.870020: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3700070000 Hz 2022-09-26 16:50:22.872757: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x555d0f8a0370 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2022-09-26 16:50:22.872793: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2022-09-26 16:50:22.906265: I tensorflow/cc/saved_model/loader.cc:202] Restoring SavedModel bundle. 2022-09-26 16:50:22.957064: I tensorflow/cc/saved_model/loader.cc:151] Running initialization op on SavedModel bundle at path: /code/model/policy/0/saved_collect_policy 2022-09-26 16:50:22.996176: I tensorflow/cc/saved_model/loader.cc:311] SavedModel load for tags { serve }; Status: success. Took 155897 microseconds.

mtrofin commented 2 years ago

OK, and it compiles I assume. Hmm. Ah, I see what happens. You're building the compiler with the tensorflow C APIs, not tflite, right? We haven't updated the documentation yet - but here's how to build with tflite:

1) make a directory somewhere, e.g. /tmp/tflitebuild && cd /tmp/tflitebuild 2) run buildbot/build_tflite.sh ... (this takes a bit - it git clones a bunch of repos and builds them) 3) notice a /tmp/tflitebuild/tflite.cmake was created 4) for your cmake (best to wipe out the build dir and re-issue cmake): instead of passing -DTENSORFLOW_C_LIB_PATH, pass -C /tmp/tflitebuild/tflite.cmake.

That's it!

mtrofin commented 2 years ago

Updated now the demo - @boomanaiden154 had a PR open (#131 ) for a while and we forgot to merge. Sorry.

xuc-X commented 2 years ago

Thanks, I try it, I recompile my LLVM project and fushcia project. but the problem still happen. Maybe something I can check or debug in the code? I have no idea.

Command is:

rm -rf $OUTPUT_DIR && \ PYTHONPATH=$PYTHONPATH:. python3 \ compiler_opt/rl/train_locally.py \ --root_dir=$OUTPUT_DIR \ --data_path=$CORPUS \ --gin_bindings=clang_path="'$LLVM_INSTALLDIR/bin/clang'" \ --gin_bindings=llvm_size_path="'$LLVM_INSTALLDIR/bin/llvm-size'" \ --gin_files=compiler_opt/rl/inlining/gin_configs/ppo_nn_agent.gin \ --gin_bindings=train_eval.warmstart_policy_dir=\"$WARMSTART_OUTPUT_DIR/saved_policy\"

Log is:

Parameters for train_eval:

==============================================================================

train_eval.agent_name = %compiler_opt.rl.constant.AgentName.PPO train_eval.batch_size = 256 train_eval.deploy_policy_name = 'saved_collect_policy' train_eval.moving_average_decay_rate = 0.8 train_eval.num_iterations = 300 train_eval.num_modules = 100 train_eval.num_policy_iterations = 3000 train_eval.train_sequence_length = 16 train_eval.use_random_network_distillation = False train_eval.warmstart_policy_dir = '/code/warmstart/saved_policy'

2022-09-26 17:53:44.895495: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-09-26 17:53:45.034631: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-09-26 17:53:45.034828: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-09-26 17:53:45.035015: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-09-26 17:53:45.035185: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-09-26 17:53:45.035344: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-09-26 17:53:45.035499: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-09-26 17:53:45.625412: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-09-26 17:53:45.625633: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-09-26 17:53:45.625834: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-09-26 17:53:45.626000: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-09-26 17:53:45.626171: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-09-26 17:53:45.626332: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10082 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:24:00.0, compute capability: 8.6 2022-09-26 17:53:45.626555: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-09-26 17:53:45.626697: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 10188 MB memory: -> device: 1, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:2d:00.0, compute capability: 8.6 2022-09-26 17:53:46.251521: I tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:629] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once. W0926 17:53:46.293635 140348014933824 ppo_agent.py:342] Only tf.keras.optimizers.Optimiers are well supported, got a non-TF2 optimizer: <tensorflow.python.training.adam.AdamOptimizer object at 0x7fa49a380970> I0926 17:53:46.903522 140348014933824 common.py:1009] No checkpoint available at /code/model I0926 17:53:47.646316 140348014933824 train_locally.py:101] Loading module specs from corpus at /code/corpus. I0926 17:53:51.522883 140348014933824 train_locally.py:107] Done loading module specs from corpus. I0926 17:53:52.110074 140348014933824 local_data_collector.py:73] prefetching took 0 I0926 17:53:52.122872 140348014933824 train_locally.py:152] Last iteration took: 0.012367 W0926 17:53:53.189572 140348014933824 save.py:271] Found untraced functions such as ActorDistributionNetwork_layer_call_fn, ActorDistributionNetwork_layer_call_and_return_conditional_losses, ConstantValueNetwork_layer_call_fn, ConstantValueNetwork_layer_call_and_return_conditional_losses, EncodingNetwork_layer_call_fn while saving (showing 5 of 92). These functions will not be directly callable after loading. /root/.local/lib/python3.8/site-packages/tensorflow/python/saved_model/nested_structure_coder.py:521: UserWarning: Encoding a StructuredValue with type tfp.distributions.Deterministic_ACTTypeSpec; loading this StructuredValue will require that this type be imported and registered. warnings.warn("Encoding a StructuredValue with type %s; loading this " INFO:tensorflow:Assets written to: /code/model/policy/0/saved_policy/assets I0926 17:53:53.458599 140348014933824 builder_impl.py:779] Assets written to: /code/model/policy/0/saved_policy/assets 2022-09-26 17:53:54.306021: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:362] Ignored output_format. 2022-09-26 17:53:54.306056: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:365] Ignored drop_control_dependency. 2022-09-26 17:53:54.306634: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /code/model/policy/0/saved_policy 2022-09-26 17:53:54.308837: I tensorflow/cc/saved_model/reader.cc:89] Reading meta graph with tags { serve } 2022-09-26 17:53:54.308854: I tensorflow/cc/saved_model/reader.cc:130] Reading SavedModel debug info (if present) from: /code/model/policy/0/saved_policy 2022-09-26 17:53:54.314542: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:365] MLIR V1 optimization pass is not enabled 2022-09-26 17:53:54.315878: I tensorflow/cc/saved_model/loader.cc:229] Restoring SavedModel bundle. 2022-09-26 17:53:54.345653: I tensorflow/cc/saved_model/loader.cc:213] Running initialization op on SavedModel bundle at path: /code/model/policy/0/saved_policy 2022-09-26 17:53:54.365923: I tensorflow/cc/saved_model/loader.cc:305] SavedModel load for tags { serve }; Status: success: OK. Took 59290 microseconds. 2022-09-26 17:53:54.404422: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var MLIR_CRASH_REPRODUCER_DIRECTORY to enable. 2022-09-26 17:53:54.513431: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2078] Estimated count of arithmetic ops: 0.011 M ops, equivalently 0.005 M MACs W0926 17:53:55.616633 140348014933824 save.py:271] Found untraced functions such as ActorDistributionNetwork_layer_call_fn, ActorDistributionNetwork_layer_call_and_return_conditional_losses, ConstantValueNetwork_layer_call_fn, ConstantValueNetwork_layer_call_and_return_conditional_losses, EncodingNetwork_layer_call_fn while saving (showing 5 of 92). These functions will not be directly callable after loading. /root/.local/lib/python3.8/site-packages/tensorflow/python/saved_model/nested_structure_coder.py:521: UserWarning: Encoding a StructuredValue with type tfp.distributions.Categorical_ACTTypeSpec; loading this StructuredValue will require that this type be imported and registered. warnings.warn("Encoding a StructuredValue with type %s; loading this " INFO:tensorflow:Assets written to: /code/model/policy/0/saved_collect_policy/assets I0926 17:53:55.860256 140348014933824 builder_impl.py:779] Assets written to: /code/model/policy/0/saved_collect_policy/assets 2022-09-26 17:53:56.730252: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:362] Ignored output_format. 2022-09-26 17:53:56.730288: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:365] Ignored drop_control_dependency. 2022-09-26 17:53:56.730407: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /code/model/policy/0/saved_collect_policy 2022-09-26 17:53:56.732630: I tensorflow/cc/saved_model/reader.cc:89] Reading meta graph with tags { serve } 2022-09-26 17:53:56.732646: I tensorflow/cc/saved_model/reader.cc:130] Reading SavedModel debug info (if present) from: /code/model/policy/0/saved_collect_policy 2022-09-26 17:53:56.737559: I tensorflow/cc/saved_model/loader.cc:229] Restoring SavedModel bundle. 2022-09-26 17:53:56.766411: I tensorflow/cc/saved_model/loader.cc:213] Running initialization op on SavedModel bundle at path: /code/model/policy/0/saved_collect_policy 2022-09-26 17:53:56.786771: I tensorflow/cc/saved_model/loader.cc:305] SavedModel load for tags { serve }; Status: success: OK. Took 56365 microseconds. 2022-09-26 17:53:56.948630: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2078] Estimated count of arithmetic ops: 0.011 M ops, equivalently 0.005 M MACs I0926 17:53:57.091879 140348014933824 local_data_collector.py:134] resolving prefetched sample took: 0 seconds I0926 17:53:57.092738 140348014933824 local_data_collector.py:73] prefetching took 0 I0926 17:53:57.092979 140348014933824 local_data_collector.py:91] Waiting for pending work from last iteration took 0.000001 Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpbxfbj34s/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator error: Could not load or create model evaluator. error: Could not setup Inlining Advisor for the requested mode and/or options Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpx5fxbs2c/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator error: Could not load or create model evaluator. error: Could not setup Inlining Advisor for the requested mode and/or options Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpm2gq9m6x/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator error: Could not load or create model evaluator. error: Could not setup Inlining Advisor for the requested mode and/or options 3 errors generated. Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpa_m6lua9/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator error: Could not load or create model evaluator. error: Could not setup Inlining Advisor for the requested mode and/or options 3 errors generated. Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpiec123gk/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator error: Could not load or create model evaluator. error: Could not setup Inlining Advisor for the requested mode and/or options 3 errors generated. Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpjfas9ul9/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator error: Could not load or create model evaluator. error: Could not setup Inlining Advisor for the requested mode and/or options Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpx83gfm17/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator error: Could not load or create model evaluator. error: Could not setup Inlining Advisor for the requested mode and/or options 3 errors generated. 3 errors generated. 3 errors generated. Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpd6fx2uw3/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator error: Could not load or create model evaluator. error: Could not setup Inlining Advisor for the requested mode and/or options 3 errors generated. Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpqer841ol/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator error: Could not load or create model evaluator. error: Could not setup Inlining Advisor for the requested mode and/or options Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmp56v46hye/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator error: Could not load or create model evaluator. error: Could not setup Inlining Advisor for the requested mode and/or options 3 errors generated.

xuc-X commented 2 years ago

I am not delete my llvm-project build directory, just directly use the cmake command to recompile and use ninja to build. Maybe it will have problem. I delete it and try again.

mtrofin commented 2 years ago

You may need to delete the build directory and then re-create it and re-issue the correct (new) cmake command. After that, and after rebuilding clang, try out the one clang invocation we tried in isolation (the one that included the path to the training model)

xc303919323 commented 2 years ago

OK，Thanks!!

google / ml-compiler-opt

Runing Demo's train_locally.py. Failed to create saved model evaluator #148

Parameters for train_eval:

==============================================================================