Open AmosLewis opened 1 month ago
@zjgarvey When python ./run.py --mode=cl-onnx-iree -v --torchtolinalg -t mygpt4
Get bug in construct_inputs.log
:
Failed test at stage construct_inputs with exception:
input node attention_mask has a dim param='unk__2824' not found in provided dim_param_dict: '{'batch_size': 1, 'seq_len': 128, 'encoder_sequence_length': 128, 'decoder_sequence_length': 128, 'sequence_length': 128}'
Traceback (most recent call last):
File "/proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/./run.py", line 224, in run_tests
inputs = inst.construct_inputs()
File "/proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/e2e_testing/framework.py", line 78, in construct_inputs
return get_sample_inputs_for_onnx_model(self.model, self.dim_param_dict)
File "/proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/e2e_testing/onnx_utils.py", line 85, in get_sample_inputs_for_onnx_model
tuple([generate_input_from_node(node, dim_param_dict) for node in inputs])
File "/proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/e2e_testing/onnx_utils.py", line 85, in <listcomp>
tuple([generate_input_from_node(node, dim_param_dict) for node in inputs])
File "/proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/e2e_testing/onnx_utils.py", line 62, in generate_input_from_node
int_dims = get_node_shape_from_dim_param_dict(node, dim_param_dict)
File "/proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/e2e_testing/onnx_utils.py", line 36, in get_node_shape_from_dim_param_dict
raise ValueError(f"input node {node.name} has a dim param='{dim}' not found in provided dim_param_dict: '{dim_param_dict}'")
ValueError: input node attention_mask has a dim param='unk__2824' not found in provided dim_param_dict: '{'batch_size': 1, 'seq_len': 128, 'encoder_sequence_length': 128, 'decoder_sequence_length': 128, 'sequence_length': 128}'
Yeah you need to specify dim params. Some examples in migraphx and nlp
@PhaneeshB
With model generate by register_test(t_model_constructor(1, ""), "mygpt4")
:
python -m torch_mlir.tools.import_onnx model.onnx -o model.mlir
iree-compile model.mlir --iree-hal-target-backends=llvm-cpu -o model.vmfb > output.mlir
iree-run-module --trace_execution=true --print_statistics=true --module=model.vmfb --function=tf2onnx --input="1x4xsi32=1" --input="1x4xsi32=1" --input="1x4xsi
32=1"
EXEC @tf2onnx
[module.tf2onnx+00000000] <block>
[module.tf2onnx+00000001] %r3 = vm.const.ref.zero
[module.tf2onnx+00000004] %i0 = vm.const.i32 -1 // 0xFFFFFFFF
[module.tf2onnx+0000000B] %i1 = vm.const.i32.zero
[module.tf2onnx+0000000E] %r4 = vm.call @hal.devices.get(%i1(0))
[module.tf2onnx+0000001C] %r4 = vm.call @hal.fence.create(%r4(!hal.device/0x0x55ead365c970), %i1(0))
[module.tf2onnx+0000002C] vm.call @module.tf2onnx$async(%r0(!hal.buffer_view/0x0x55ead365d610), %r1(!hal.buffer_view/0x0x55ead365d740), %r2(!hal.buffer_view/0x0x55ead365d870), %r3(null), %r4(!hal.fence/0x0x55ead365e390))
[module.tf2onnx$async+00000000] <block>
[module.tf2onnx$async+00000001] vm.call @hal.fence.signal(%r4(!hal.fence/0x0x55ead365e390))
[module.tf2onnx$async+0000000C] vm.return
[module.tf2onnx+00000040] %i0 = vm.call.varadic @hal.fence.await(%i0(4294967295), %r4(!hal.fence/0x0x55ead365e390))
[module.tf2onnx+00000056] vm.return
[[ iree_hal_allocator_t memory statistics ]]
HOST_LOCAL: 0B peak / 0B allocated / 0B freed / 0B live
DEVICE_LOCAL: 48B peak / 48B allocated / 48B freed / 0B live
vmfb generated successfully.
iree commit
commit 9f93073e0c5442dbb67262bd29edb37cd2c1e3b8 (HEAD -> main, upstream/main)
Author: Maksim Levental <maksim.levental@gmail.com>
Date: Tue Oct 15 12:26:14 2024 -0700
[CMake] Don't update compile definitions for imported targets for MSCV (#18766)
torch-mlir commit:
commit 45bb17ebfe5e9cdcfd2cfabf850d9dec7127c5ab (HEAD -> main, upstream/main)
Author: Justin Ngo <justin.ngo@arm.com>
Date: Tue Oct 15 08:38:02 2024 -0700
[TOSA] Add legalization for empty, scatter, slice_scatter, diag_embed (#3792)
Is this model in azure? Do you want to merge these changes or just have the draft pr up. I'd personally like to commit the changes with the testtensors dtype checking and opset version updating for sibling models.
Is this model in azure? Do you want to merge these changes or just have the draft pr up. I'd personally like to commit the changes with the testtensors dtype checking and opset version updating for sibling models.
No, the model is in customer's google drive.
With raw model download from google drive, compile error here:
(e2e_venv) ➜ mygpt4 git:(mygpt) ✗ unzip model.onnx.zip
Archive: model.onnx.zip
inflating: model.onnx
(e2e_venv) ➜ mygpt4 git:(mygpt) ✗ ls
model.onnx model.onnx.zip
(e2e_venv) ➜ mygpt4 git:(mygpt) ✗ python -m torch_mlir.tools.import_onnx model.onnx -o model.mlir
(e2e_venv) ➜ mygpt4 git:(mygpt) ✗ iree-compile model.mlir --iree-hal-target-backends=llvm-cpu -o model.vmfb > output.mlir
iree-compile: /proj/gdba/shark/chi/src/iree/third_party/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:2420: llvm::LogicalResult legalizeUnresolvedMaterialization(mlir::RewriterBase &, (anonymous namespace)::UnresolvedMaterializationRewrite *): Assertion `newMaterialization.getType() == outputType && "materialization callback produced value of incorrect type"' failed.
Please report issues to https://github.com/iree-org/iree/issues and include the crash backtrace.
Stack dump:
0. Program arguments: iree-compile model.mlir --iree-hal-target-backends=llvm-cpu -o model.vmfb
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0 libIREECompiler.so 0x00007f307cfd8e5d llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 61
1 libIREECompiler.so 0x00007f307cfd934b
2 libIREECompiler.so 0x00007f307cfd7376 llvm::sys::RunSignalHandlers() + 134
3 libIREECompiler.so 0x00007f307cfd9b65
4 libc.so.6 0x00007f3070306520
5 libc.so.6 0x00007f307035a9fc pthread_kill + 300
6 libc.so.6 0x00007f3070306476 raise + 22
7 libc.so.6 0x00007f30702ec7f3 abort + 211
8 libc.so.6 0x00007f30702ec71b
9 libc.so.6 0x00007f30702fde96
10 libIREECompiler.so 0x00007f3086675b6e
11 libIREECompiler.so 0x00007f3086674a84 mlir::OperationConverter::convertOperations(llvm::ArrayRef<mlir::Operation*>) + 1188
12 libIREECompiler.so 0x00007f3086678749 mlir::applyPartialConversion(llvm::ArrayRef<mlir::Operation*>, mlir::ConversionTarget const&, mlir::FrozenRewritePatternSet const&, mlir::ConversionConfig) + 105
13 libIREECompiler.so 0x00007f308667884d mlir::applyPartialConversion(mlir::Operation*, mlir::ConversionTarget const&, mlir::FrozenRewritePatternSet const&, mlir::ConversionConfig) + 125
14 libIREECompiler.so 0x00007f3080b3a593
15 libIREECompiler.so 0x00007f307d44791b
16 libIREECompiler.so 0x00007f307d4478b5
17 libIREECompiler.so 0x00007f307cee3459
18 libIREECompiler.so 0x00007f307d44aa9d
19 libIREECompiler.so 0x00007f307d442e33 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) + 851
20 libIREECompiler.so 0x00007f307d4433e4 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) + 388
21 libIREECompiler.so 0x00007f307d444ecc mlir::PassManager::runPasses(mlir::Operation*, mlir::AnalysisManager) + 108
22 libIREECompiler.so 0x00007f307d444def mlir::PassManager::run(mlir::Operation*) + 1151
23 libIREECompiler.so 0x00007f307ce20e9a
24 libIREECompiler.so 0x00007f307ce20773 ireeCompilerInvocationPipeline + 35
25 libIREECompiler.so 0x00007f307d3c81ae
26 libIREECompiler.so 0x00007f307d3c75de
27 libIREECompiler.so 0x00007f307ce7388b ireeCompilerRunMain + 27
28 iree-compile 0x000055a454eb77b2
29 libc.so.6 0x00007f30702edd90
30 libc.so.6 0x00007f30702ede40 __libc_start_main + 128
31 iree-compile 0x000055a454eb76c5
[1] 188732 IOT instruction (core dumped) iree-compile model.mlir --iree-hal-target-backends=llvm-cpu -o model.vmfb >
@PhaneeshB Get some bug fix. Now we can get the model run to the first shape op run successfully. Next to locate the op that fail.
python ./run.py --mode=cl-onnx-iree -v -t mygpt4_trunc_shape_1
Stages to be run: ['setup', 'import_model', 'preprocessing', 'compilation', 'construct_inputs', 'native_inference', 'compiled_inference', 'postprocessing']
Test list: ['mygpt4_trunc_shape_1']
running test mygpt4_trunc_shape_1...
PASSED
Test Summary:
PASSES: 1
TOTAL: 1
results stored in /proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/test-run
To debug https://github.com/iree-org/iree/issues/18767