Abort (core dumped) - Githubissues

pdhirajkumarprasad commented 4 days ago

What happened?

For the attached IR seeing abort during runtime.

command:

iree-compile model.modified.mlir --iree-hal-target-backends=llvm-cpu -o compiled_model.vmfb 
iree-run-module --module='compiled_model.vmfb' --device=local-task --function='torch_jit' --input='1x3x224x224xf32=@input.0.bin' --output=@'output.0.bin'

input.0.txt model.mlir.txt

Steps to reproduce your issue

download two attached file and name them as 'model.modified.mlir' and 'input.0.bin' and invoke the command mentioned above

What component(s) does this issue relate to?

Runtime

Version information

No response

Additional context

No response

pashu123 commented 3 days ago

There are cf.asserts and I don't know how well they are supported by the runtime. Please add the flag iree-opt-strip-assertions i.e., iree-compile -iree-opt-strip-assertions model.mlir.txt --iree-hal-target-backends=llvm-cpu -o compiled_model.vmfb

IanWood1 commented 3 days ago

I think this might be the correct behavior (apparently cf.assert isn't required to do anything with the message) because the process is bing terminated via SIGABRT. Looking at the input IR on line 251:

cf.assert %false, "mismatching contracting dimension"

So this seems like a possible lowering issue. However, I'm not sure why no message was raised.

benvanik commented 3 days ago

if you strip assertions then you'll probably get crashes - make sure you aren't stripping them if you want the errors.

(there may also be cases where some things aren't properly guarded by the assertions, so you get death before the assertion is hit - I don't think we have bugs like that, but assertions are rarely used so it's possible - you can use --trace_execution at runtime to see the program flow and should see a vm.fail if the assertion is hit)

zjgarvey commented 3 days ago

I think I have a resolution for many of these inference crashes at the torch level. I think most of these are related to the shape cleanup work we've been doing.

https://github.com/llvm/torch-mlir/pull/3781 + setting up a different shape refinement pipeline on the frontend seems to be working well on the sampling of models I've been testing from https://github.com/pdhirajkumarprasad/SHARK-TestSuite/blob/feature/qa/issue/onnx-to-torch/abort-at-runtime.

I'll add some tests to the linked PR, post some changes to our pipeline in the test suite, and post a summary report of the thirty models I tried locally.

zjgarvey commented 3 days ago

This is from a sampling of distinct sounding models from the list of runtime crashing models after the changes mentioned above.

The passes I took to generate linalg IR for these models:

torch-mlir-opt --convert-torch-onnx-to-torch --torch-lower-to-backend-contract --torch-scalarize-shapes --torch-shape-refinement-pipeline --torch-backend-to-linalg-on-tensors-backend-pipeline

With the scalarize shapes changes in the draft pr.

Passing Summary

TOTAL TESTS = 30	Stage	# Passing	% of Total
Setup	30	100.0%	100.0%
IREE Compilation	29	96.7%	96.7%
Gold Inference	29	96.7%	100.0%
IREE Inference Invocation	25	83.3%	86.2%
Inference Comparison (PASS)	25	83.3%	100.0%

Fail Summary

TOTAL TESTS = 30	Stage	# Failed at Stage
Setup	0	0.0%
IREE Compilation	1	3.3%
Gold Inference	0	0.0%
IREE Inference Invocation	4	13.3%
Inference Comparison	0	0.0%

Test Run Detail

Test was run with the following arguments: Namespace(device='local-task', backend='llvm-cpu', iree_compile_args=None, mode='cl-onnx-iree', torchtolinalg=True, stages=None, skip_stages=None, benchmark=False, load_inputs=False, groups='all', test_filter=None, testsfile='inference1.txt', tolerance=None, verbose=True, rundirectory='./test-onnx', no_artifacts=False, cleanup='2', report=True, report_file='reports/inference1.md')

Test	Exit Status	Mean Benchmark Time (ms)
model--all-MiniLM-L12-v2-qa-all--LLukas22	PASS	None
model--bart-base-few-shot-k-1024-finetuned-squad-seed-2--anas-awadalla	compiled_inference	None
model--bart-base-squad2--sjrhuschlee	compiled_inference	None
model--bart-large-finetuned-squadv1--valhalla	compiled_inference	None
model--bengali_language_NER--Suchandra	PASS	None
model--bert-base-cased-cefr--LordCoffee	PASS	None
model--bert-base-finetuned-nli--Jihyun22	PASS	None
model--bert-base-multilingual-cased-finetuned-squad--JensH	PASS	None
model--bert-base-multilingual-uncased-finetuned-squad--Martin97Bozic	PASS	None
model--bert-base-NER--dslim	PASS	None
model--bert-base-qa--srcocotero	PASS	None
model--bert-base-turkish-128k-cased-finetuned_lr-2e-05_epochs-3--husnu	PASS	None
model--bert-base-tweetner7-2021--tner	PASS	None
model--bert-base-uncased-few-shot-k-1024-finetuned-squad-seed-0--anas-awadalla	PASS	None
model--Bert_Squad--johnjose223	PASS	None
model--BioBERT-finetuned-ner-conll2003--ViktorDo	PASS	None
model--EstBERT128_sentiment--tartuNLP	PASS	None
model--FinancialBERT-Sentiment-Analysis--ahmedrachid	PASS	None
model--GPyT--Sentdex	compiled_inference	None
model--IMDB_BERT_5E--pig4431	PASS	None
model--MetaQA--haritzpuerto	PASS	None
model--MiniLM-L12-H384-uncased-squad--haritzpuerto	PASS	None
model--MTL-bert-base-uncased-ww-squad--jgammack	PASS	None
model--SEAD-L-6_H-384_A-12-wnli--course5i	PASS	None
model--TinyBERT_General_4L_312D-squad--haritzpuerto	PASS	None
model--Trial_3_Results--sunitha	PASS	None
mvitv2_base	PASS	None
mvitv2_large	import_model	None
mvitv2_small	PASS	None
mvitv2_tiny	PASS	None

iree-org / iree

Abort (core dumped) #18741