microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.61k stars 2.92k forks source link

[Training] RuntimeError: ORTModule's extensions were not detected #16066

Open gchun88 opened 1 year ago

gchun88 commented 1 year ago

Describe the issue

I got an error while doing the below code :

trainer = ORTTrainer(
    model = model,
    args = args,
    train_dataset = tokenized_dataset["train"],
    eval_dataset = tokenized_dataset["validation"],
    data_collator = data_collator,
    compute_metrics = compute_metrics,
    tokenizer = tokenizer,
    feature = 'token-classification',
)
trainer.train()

ORTModuleInitException: ORTModule's extensions were not detected at '/local_disk0/.ephemeral_nfs/envs/pythonEnv.../lib/python3.9/site-packages/onnxruntime/training/ortmodule/torch_cpp_extensions' folder. Run python -m torch_ort.configure before using ORTModule frontend.

I have 0 idea where to look at and start debugging it.

To reproduce

Please run the above code and I am doing it in the below libraries

%pip install torch==2.0.1+cu118 -f https://download.pytorch.org/whl/cu118/torch_stable.html
%pip install onnx ninja
%pip install onnxruntime-training==1.14.0 -f https://download.onnxruntime.ai/onnxruntime_stable_cu114.html

%pip install --upgrade protobuf==3.20.2
# %pip install optimum[onnxruntime-gpu]==1.7.0
%pip install evaluate
%pip install seqeval
%pip install datasets
%pip install torch-ort
%pip install --upgrade accelerate
%pip install optimum

Urgency

I have to finish it by end of this week at least and need to figure this out fast.

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.7.0

PyTorch Version

3.9

Execution Provider

CUDA

Execution Provider Library Version

cuda 11.3

mindest commented 1 year ago

Hi @gchun88, have you tried running python -m torch_ort.configure after installing onnxruntime-training and torch-ort? If not, please try it first.

thiagocrepaldi commented 1 year ago

@gchun88 add

%python -m torch_ort.configure after %pip install torch-ort and you should be good to go

gchun88 commented 1 year ago

Hi @gchun88, have you tried running python -m torch_ort.configure after installing onnxruntime-training and torch-ort? If not, please try it first.

I have tried that command, but it did not work out. The message showed cannot find the torch_ort library which I can import it in python.

gchun88 commented 1 year ago

The environment is actually under databricks. So it's a bit tricky on my end to deal with python wheel with the correct environment, but I ran it with the magin function %python -m torch_ort.configure