intel / intel-extension-for-pytorch

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
Apache License 2.0
1.55k stars 236 forks source link

Segmentation fault (core dumped) when execute model.to("xpu") #391

Closed evelinamorim closed 3 months ago

evelinamorim commented 1 year ago

Describe the bug

After following the instructions in the tutorial: https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/installation.html.

I executed the code (same as in the github page):

import torch
import torchvision.models as models

model = models.resnet50(pretrained=True)
model.eval()
data = torch.rand(1, 3, 224, 224)

import intel_extension_for_pytorch as ipex
model = model.to('xpu')

However, the last line produced: Segmentation fault (core dumped). It was in the lazy_init of intel_extension_for_pytorch.

Versions

PyTorch version: 1.13.0a0+git6c9b55e PyTorch CXX11 ABI: Yes IPEX version: 1.13.120+xpu IPEX commit: c2a37012e Build type: Release

OS: Ubuntu 22.04.2 LTS (x86_64) GCC version: (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0 Clang version: N/A IGC version: 2023.2.0 (2023.2.0.20230622) CMake version: version 3.26.4 Libc version: glibc-2.35

Python version: 3.9.17 (main, Jun 6 2023, 20:11:21) [GCC 11.3.0] (64-bit runtime) Python platform: Linux-5.19.0-46-generic-x86_64-with-glibc2.35 Is XPU available: True DPCPP runtime version: 2023.2.0 MKL version: 2023.2.0 GPU models and configuration: [0] _DeviceProperties(name='Intel(R) UHD Graphics', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=1, total_memory=12559MB, max_compute_units=24) Intel OpenCL ICD version: 23.17.26241.33-647~22.04 Level Zero version: 1.3.26241.33-647~22.04

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz CPU family: 6 Model: 142 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 Stepping: 12 CPU max MHz: 4900,0000 CPU min MHz: 400,0000 BogoMIPS: 4599.93 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust sgx bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities Virtualization: VT-x L1d cache: 128 KiB (4 instances) L1i cache: 128 KiB (4 instances) L2 cache: 1 MiB (4 instances) L3 cache: 8 MiB (1 instance) NUMA node(s): 1 NUMA node0 CPU(s): 0-7 Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable Vulnerability Retbleed: Mitigation; Enhanced IBRS Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence Vulnerability Srbds: Mitigation; Microcode Vulnerability Tsx async abort: Not affected

Versions of relevant libraries: [pip3] intel-extension-for-pytorch==1.13.120+xpu [pip3] numpy==1.25.1 [pip3] torch==1.13.0a0+git6c9b55e [pip3] torchvision==0.14.1a0+5e8e2f1 [conda] N/A

gujinghui commented 1 year ago

@evelinamorim can you try to import intel_extension_for_pytorch immediately after import torch?

@jingxu10 pls help confirm the issue.

gekeleda commented 1 year ago

@gujinghui I have the same issue, no matter when intel_extension_for_pytorch is imported. That means, the following code results in "Segmentation fault (core dumped)":

import torch
import intel_extension_for_pytorch as ipex

import torchvision.models as models

model = models.resnet50(pretrained=True)
model.eval()
data = torch.rand(1, 3, 224, 224)

model = model.to('xpu')
jgong5 commented 1 year ago

@gekeleda Do you have "gdb" installed in your environment. If so, you may try the following and report back what you see when the core dump happens: gdb --args `which python` your_script.py

evelinamorim commented 1 year ago

@gujinghui , as @gekeleda said, the same result is produced no matter the order of the imports.

@jgong5 I executed with gdb and the following output was produced.

Starting program: /home/evelinamorim/PycharmProjects/CT-Coref-pt/venv_pycharm/bin/python test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffde9ff640 (LWP 47326)]
[New Thread 0x7fffde1fe640 (LWP 47327)]
[New Thread 0x7fffdb9fd640 (LWP 47328)]
[New Thread 0x7fffd71fc640 (LWP 47329)]
[New Thread 0x7fffd69fb640 (LWP 47330)]
[New Thread 0x7fffd21fa640 (LWP 47331)]
[New Thread 0x7fffcf9f9640 (LWP 47332)]
warning: File "/opt/intel/oneapi/compiler/2023.2.0/linux/lib/libsycl.so.6.2.0-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
        add-auto-load-safe-path /opt/intel/oneapi/compiler/2023.2.0/linux/lib/libsycl.so.6.2.0-gdb.py
line to your configuration file "/home/evelinamorim/.config/gdb/gdbinit".
To completely disable this security protection add
        set auto-load safe-path /
line to your configuration file "/home/evelinamorim/.config/gdb/gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
        info "(gdb)Auto-loading safe path"
[Thread 0x7fffcf9f9640 (LWP 47332) exited]
[Thread 0x7fffd21fa640 (LWP 47331) exited]
[Thread 0x7fffd69fb640 (LWP 47330) exited]
[Thread 0x7fffd71fc640 (LWP 47329) exited]
[Thread 0x7fffdb9fd640 (LWP 47328) exited]
[Thread 0x7fffde1fe640 (LWP 47327) exited]
[Thread 0x7fffde9ff640 (LWP 47326) exited]
[Detaching after fork from child process 47339]
[Detaching after fork from child process 47340]
[Detaching after fork from child process 47344]
warning: Temporarily disabling breakpoints for unloaded shared library "/opt/intel/oneapi/compiler/2023.2.0/linux/lib/x64/libintelocl_emu.so"
warning: Temporarily disabling breakpoints for unloaded shared library "/opt/intel/oneapi/compiler/2023.2.0/linux/lib/x64/libintelocl.so"
[New Thread 0x7fffcf9f9640 (LWP 47345)]
warning: Temporarily disabling breakpoints for unloaded shared library "/opt/intel//oneapi/compiler/latest/linux/lib/x64/libintelocl.so"
warning: Temporarily disabling breakpoints for unloaded shared library "/opt/intel//oneapi/compiler/latest/linux/lib/x64/libintelocl_emu.so"
[New Thread 0x7fffd21fa640 (LWP 47348)]
[New Thread 0x7fffd69fb640 (LWP 47349)]
[Thread 0x7fffd69fb640 (LWP 47349) exited]
/home/evelinamorim/PycharmProjects/CT-Coref-pt/venv_pycharm/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/home/evelinamorim/PycharmProjects/CT-Coref-pt/venv_pycharm/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
[New Thread 0x7fffd71fc640 (LWP 47350)]
[New Thread 0x7ffeeb1de640 (LWP 47351)]
[New Thread 0x7ffeea9dd640 (LWP 47352)]

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
jgong5 commented 1 year ago

What do you see from the backtrace after typing "bt"?

evelinamorim commented 1 year ago

Thanks for the quick reply!

The backtrace is the following:

https://gist.github.com/evelinamorim/40dce656614394bad491955b7bc274a9#file-error_intel_pytorch_extension

gujinghui commented 1 year ago

@evelinamorim

From the version information you provided, you are using oneAPI 2023.2 with IPEX 1.13 release?

IGC version: 2023.2.0 (2023.2.0.20230622)
DPCPP runtime version: 2023.2.0
MKL version: 2023.2.0

Can you try with Intel® oneAPI Base Toolkit 2023.1, as mentioned in our release note? https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/installation.html#software-requirements

evelinamorim commented 1 year ago

@gujinghui Thanks again for the quick reply! The change of version of oneAPI Base Toolkit worked! Thanks!

GunturuSandeep commented 7 months ago

@gujinghui @evelinamorim , Could you guys please elaborate how to solve this issue.. I am running a docker env (building from a Docker file ) which all are preinstalled. But facing exactly the same issue "segmentation fault : Core dumped ". "ipex-llm:2.1.10"

pujaltes commented 4 months ago

Hey @gujinghui, I want to use IPEX 1.13 but it appears that the oneAPI Base Toolkit 2023.1 is no longer available for download. Is there something we can do if we want to use IPEX with pytorch 1.13? Is IPEX 1.13 essentially deprecated?

gujinghui commented 4 months ago

The old version of oneAPI toolkit is obsoleted. I don't think we have any copy of these old versions.

IPEX 1.13 is coupled with old oneAPI toolkit. Therefore, IPEX 1.13 is deprecated, as well. Sorry about that.

May I know why you have to work on IPEX 1.13?

pujaltes commented 4 months ago

Thank you for your prompt response and confirmation that IPEX 1.13 has been deprecated. We were hoping to avoid having to upgrade some of our models to Pytorch 2.

jingxu10 commented 3 months ago

Seems like no actions are needed. close at this time. Feel free to reopen if you have further requests.