gramineproject / examples

Sample applications configs for Gramine
BSD 3-Clause "New" or "Revised" License
29 stars 22 forks source link

ImportError: libtorch_cuda.so: failed to map segment from shared object #66

Closed Smart-Yanmeng closed 1 year ago

Smart-Yanmeng commented 1 year ago

Hi, I'm trying to run pytorch example and met some problems.

The same pytorch examples work properly under the Gramine enviroment. However, when we run the same codes under the SGX env. Here are the problems I have met.

When I gramine-sgx ./pytorch pytorchexample then I got:

(host_framework.c:320:add_pages_to_enclave) debug: Adding pages to enclave: 0x10000-0xfe912000 [REG:RWX] (free) Killed (host_main.c:569:initialize_enclave) debug: Added all pages to SGX enclave (host_framework.c:523:init_enclave) debug: Enclave initializing: (host_framework.c:524:init_enclave) debug: enclave id: 0x00000000fffff000 (host_framework.c:525:init_enclave) debug: mr_enclave: 7aee869f5992b2dfcd482776fa454c8da41963b5c5f12a1046e24fccd2fa64e2 error: Failed to initialize child process: Broken pipe (PAL_ERROR_CONNFAILED_PIPE)

Then I set sgx.enclave_size = "2G" in pytorch.manifest.template file and I got:

Traceback (most recent call last): File "pytorchexample.py", line 4, in from torchvision import models File "/usr/local/lib/python3.8/dist-packages/torchvision/init.py", line 5, in import torch File "/usr/local/lib/python3.8/dist-packages/torch/init.py", line 229, in from torch._C import * # noqa: F403 ImportError: libtorch_cuda.so: failed to map segment from shared object

Error in sys.excepthook: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 72, in apport_excepthook from apport.fileutils import likely_packaged, get_recent_crashes File "/usr/lib/python3/dist-packages/apport/init.py", line 5, in from apport.report import Report File "/usr/lib/python3/dist-packages/apport/report.py", line 32, in import apport.fileutils File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 27, in from apport.packaging_impl import impl as packaging File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 23, in import apt File "/usr/lib/python3/dist-packages/apt/init.py", line 36, in apt_pkg.init_system() apt_pkg.Error: E:Error reading the CPU table

Original exception was: Traceback (most recent call last): File "pytorchexample.py", line 4, in from torchvision import models File "/usr/local/lib/python3.8/dist-packages/torchvision/init.py", line 5, in import torch File "/usr/local/lib/python3.8/dist-packages/torch/init.py", line 229, in from torch._C import * # noqa: F403 ImportError: libtorch_cuda.so: failed to map segment from shared object

My environment is Ubuntu 20.04 with 8G RAM, is this my problem with my machine?

dimakuv commented 1 year ago

@Smart-Yanmeng I guess you already solved your problem, but for history: yes, the problem was with your 8GB of RAM -- this was not sufficient for several SGX enclaves, each with 4GB of enclave size. And apparently decreasing the enclave size to 2GB is too small for your PyTorch application.

There are several ways to fix this. The simplest one is to move to another machine that has more RAM installed in it. The other way to solve it is to create/increase your swap file -- but this comes at a cost of veeery bad slow execution.

Smart-Yanmeng commented 1 year ago

Thanks for your reply!

Dmitrii Kuvaiskii @.***> 于2023年5月4日周四 17:17写道:

@Smart-Yanmeng https://github.com/Smart-Yanmeng I guess you already solved your problem, but for history: yes, the problem was with your 8GB of RAM -- this was not sufficient for several SGX enclaves, each with 4GB of enclave size. And apparently decreasing the enclave size to 2GB is too small for your PyTorch application.

There are several ways to fix this. The simplest one is to move to another machine that has more RAM installed in it. The other way to solve it is to create/increase your swap file -- but this comes at a cost of veeery bad slow execution.

— Reply to this email directly, view it on GitHub https://github.com/gramineproject/examples/issues/66#issuecomment-1534372892, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXO3IDLOCHC6M2AQYMDTLNLXENX4DANCNFSM6AAAAAAXQISEH4 . You are receiving this because you were mentioned.Message ID: @.***>