tkestack / vcuda-controller

Other
488 stars 156 forks source link

get "invalid device context" or "segmentation fault" problems in aarch64 machine #35

Open DennisYoung96 opened 1 year ago

DennisYoung96 commented 1 year ago

thanks for ur excellent codes and open source spirit.

Enviroment Info gpu-manager version: built on master vcuda version: built on master nvidia driver: 470.199.02 or 470.42.01 or 460.106.00 cpu : aarch64 gpu: Tesla T4

details these days i get "invalid device context" or "segmentation fault" problems in my aarch64 machine. when every app init ,it reports 5 functions not found image when i use CUDA samples image when use pytorch demo image when change to 460.x driver. it reports segmentation fault

but, it will works if i give whole gpu rates to one pod(set vcore=100) the last, it does well in x86 machine (same 470.x driver and T4 gpu card) at the same time.

so, are there any diffrences between aarch64 driver and x86 drivers? can any gentleman give advice on this? need ur help

DennisYoung96 commented 1 year ago

@mYmNeo @hzliangbin please help

hiahia121 commented 8 months ago

I meet the same problem, can anyone help

ranxuxin001 commented 3 months ago

I saw some article said the visualization driven is different between x86 and aarch64. x86 use cgroupfs.

ranxuxin001 commented 3 months ago

By the way, are you using nvidia tesla t4 with turing architecture?