gramineproject / gramine

A library OS for Linux multi-process applications, with Intel SGX support
GNU Lesser General Public License v3.0
587 stars 193 forks source link

Cannot link library #1932

Closed thempp66 closed 2 months ago

thempp66 commented 3 months ago

Description of the problem

It seems that there are some errors when linking to the library.

Steps to reproduce

1.download and use gramine v1.7 docker image 2.install pip 3.install lib install pip install -U pip wheel setuptools && pip install concrete-ml 4.make and run the demo

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from concrete.ml.sklearn import LogisticRegression

# Lets create a synthetic data-set
x, y = make_classification(n_samples=100, class_sep=2, n_features=30, random_state=42)

# Split the data-set into a train and test set
X_train, X_test, y_train, y_test = train_test_split(
    x, y, test_size=0.2, random_state=42
)

# Now we train in the clear and quantize the weights
model = LogisticRegression(n_bits=8)
model.fit(X_train, y_train)

# We can simulate the predictions in the clear
y_pred_clear = model.predict(X_test)

# We then compile on a representative set
model.compile(X_train)

# Finally we run the inference on encrypted inputs !
y_pred_fhe = model.predict(X_test, fhe="execute")

print("In clear  :", y_pred_clear)
print("In FHE    :", y_pred_fhe)
print(f"Similarity: {int((y_pred_fhe == y_pred_clear).mean()*100)}%")

# Output:
    # In clear  : [0 0 0 0 1 0 1 0 1 1 0 0 1 0 0 1 1 1 0 0]
    # In FHE    : [0 0 0 0 1 0 1 0 1 1 0 0 1 0 0 1 1 1 0 0]
    # Similarity: 100%

Expected results

    In clear  : [0 0 0 0 1 0 1 0 1 1 0 0 1 0 0 1 1 1 0 0]
    In FHE    : [0 0 0 0 1 0 1 0 1 1 0 0 1 0 0 1 1 1 0 0]
    Similarity: 100%

Actual results

full log attachment: log.txt

Below is the main error of the log:

Traceback (most recent call last):
  File "scripts/test-ml.py", line 21, in <module>
    model.compile(X_train)
  File "/usr/local/lib/python3.8/dist-packages/concrete/ml/sklearn/base.py", line 575, in compile
    self.fhe_circuit_ = module_to_compile.compile(
  File "/usr/local/lib/python3.8/dist-packages/concrete/fhe/compilation/compiler.py", line 606, in compile
    circuit = Circuit(
  File "/usr/local/lib/python3.8/dist-packages/concrete/fhe/compilation/circuit.py", line 67, in __init__
    self.enable_fhe_execution()
  File "/usr/local/lib/python3.8/dist-packages/concrete/fhe/compilation/circuit.py", line 134, in enable_fhe_execution
    self.server = Server.create(
  File "/usr/local/lib/python3.8/dist-packages/concrete/fhe/compilation/server.py", line 220, in create
    compilation_result = support.compile(mlir, options, compilation_context)
  File "/usr/local/lib/python3.8/dist-packages/concrete/compiler/library_support.py", line 172, in compile
    self.cpp().compile(
RuntimeError: Can't emit artifacts: Command failed:ld --shared -o /tmp/tmp6b4tt02t/sharedlib.so /tmp/tmp6b4tt02t/program.module-0.mlir.o /usr/local/lib/python3.8/dist-packages/concrete_python.libs/libConcretelangRuntime-32c53a6a.so -rpath=/usr/local/lib/python3.8/dist-packages/concrete_python.libs --disable-new-dtags 2>&1
Code:32512

Gramine commit hash

10e93534169802be16fc9e2b3e9ac70d08efcb41

mkow commented 3 months ago

There are many errors like that in your log:

(libos_parser.c:1658:buf_write_all) [P7:T178:python3.8] trace: ---- execve("pip", [pip,--disable-pip-version-check,list,], [LD_LIBRARY_PATH=/lib:/lib:/lib/x86_64-linux-gnu:/usr//lib/x86_64-linux-gnu,OMP_NUM_THREADS=4,]) ...
(libos_parser.c:1658:buf_write_all) [P7:T178:python3.8] trace: ---- return from execve(...) = -2

(here it's a bit weird that execve is called with a relative path, is this a bug in concrete-ml?)

(libos_parser.c:1658:buf_write_all) [P6:T177:python3.8] trace: ---- execve("/bin/sh", [sh,-c,--,ld --shared -o /tmp/tmp6b4tt02t/sharedlib.so /tmp/tmp6b4tt02t/program.module-0.mlir.o /usr/local/lib/python3.8/dist-packages/concrete_python.libs/libConcretelangRuntime-32c53a6a.so -rpath=/usr/local/lib/python3.8/dist-packa
(libos_parser.c:1658:buf_write_all) [P6:T177:python3.8] trace: ges/concrete_python.libs --disable-new-dtags 2>&1,], [LD_LIBRARY_PATH=/lib:/lib:/lib/x86_64-linux-gnu:/usr//lib/x86_64-linux-gnu,OMP_NUM_THREADS=4,]) ...
(libos_parser.c:1658:buf_write_all) [P6:T177:python3.8] trace: ---- return from execve(...) = -2

-2 is ENOENT. You're missing these binaries inside the Gramine namespace (in-enclave virtual filesystem), you're probably missing some mounts in your manifest.

thempp66 commented 3 months ago

Thanks @mkow for your reply! I'm not sure if it is a bug in concrete-ml but the same code works well in host without Gramine indeed. As you said, if there is something wrong with mounts, what can I do to fix these errors? For example, I should mount to some binaries like /tmp/tmp6b4tt02t/sharedlib.so in my manifest. Is that right?

dimakuv commented 3 months ago

For example, I should mount to some binaries like /tmp/tmp6b4tt02t/sharedlib.so in my manifest. Is that right?

No need to specify separate files, just specify whole directories. So in this particular case, a mount like this is enough:

fs.mounts = [
  { type = "tmpfs", path = "/tmp" },
]

Similarly, to enable the files under e.g. /bin/ directory, do this:

fs.mounts = [
  { path = "/bin", uri = "file:/bin" },
]

Read more info here: https://gramine.readthedocs.io/en/stable/manifest-syntax.html#fs-mount-points

thempp66 commented 2 months ago

I have tried to mount the file and path it need. But there is still the same error. I am not sure if set the config in a wrong way or those configurations and errors are unrelated. Here is my manifest:

# Copyright (C) 2023 Gramine contributors
# SPDX-License-Identifier: BSD-3-Clause

# Python3 manifest example

loader.entrypoint = "file:{{ gramine.libos }}"
libos.entrypoint = "{{ entrypoint }}"

#loader.log_level = "{{ log_level }}"
loader.log_level = "all"

loader.env.LD_LIBRARY_PATH = "/lib:/lib:{{ arch_libdir }}:/usr/{{ arch_libdir }}"

# Python's NumPy spawns as many threads as there are CPU cores, and each thread
# consumes a chunk of memory, so on large machines 1G enclave size may be not enough.
# We limit the number of spawned threads via OMP_NUM_THREADS env variable.
loader.env.OMP_NUM_THREADS = "4"

loader.insecure__use_cmdline_argv = true

sys.enable_sigterm_injection = true

fs.mounts = [
  { path = "/lib", uri = "file:{{ gramine.runtimedir() }}" },
  { path = "{{ arch_libdir }}", uri = "file:{{ arch_libdir }}" },
  { path = "/usr/{{ arch_libdir }}", uri = "file:/usr/{{ arch_libdir }}" },
{% for path in python.get_sys_path(entrypoint) %}
  { path = "{{ path }}", uri = "file:{{ path }}" },
{% endfor %}
  { path = "{{ entrypoint }}", uri = "file:{{ entrypoint }}" },
  { path = "/etc/hosts", uri = "file:helper-files/hosts" },

  { type = "tmpfs", path = "/tmp" },
  { path = "/usr/local/lib/python3.8/dist-packages/concrete_python.libs" , uri = "file:/usr/local/lib/python3.8/dist-packages/concrete_python.libs" },
  { path = "/bin" , uri = "file:/bin" }
]

sys.stack.size = "2M"
sys.enable_extra_runtime_domain_names_conf = true

sgx.debug = true
sgx.edmm_enable = {{ 'true' if env.get('EDMM', '0') == '1' else 'false' }}
sgx.enclave_size = "4G"
#sgx.max_threads = {{ '1' if env.get('EDMM', '0') == '1' else '32' }}
sgx.max_threads = 128

sgx.remote_attestation = "{{ ra_type }}"
sgx.ra_client_spid = "{{ ra_client_spid }}"
sgx.ra_client_linkable = {{ 'true' if ra_client_linkable == '1' else 'false' }}

sgx.trusted_files = [
  "file:{{ gramine.libos }}",
  "file:{{ entrypoint }}",
  "file:{{ gramine.runtimedir() }}/",
  "file:{{ arch_libdir }}/",
  "file:/usr/{{ arch_libdir }}/",
{% for path in python.get_sys_path(entrypoint) %}
  "file:{{ path }}{{ '/' if path.is_dir() else '' }}",
{% endfor %}
  "file:scripts/",
  "file:helper-files/",
]

sgx.allowed_files = [
  "file:test.onnx",
  "file:.artifacts/",
  "file:/usr/local/lib/python3.8/dist-packages/concrete_python.libs"
]

And here is the log.txt. Actually I have no idea what to do next. Could you please give me some more probably way to solve the error? I will appreciate a lot!

dimakuv commented 2 months ago

From the log:

(libos_parser.c:1658:buf_write_all) [P2:T5:python3.8] trace: ---- execve("/bin/uname", [uname,-p,], [LD_LIBRARY_PATH=/lib:/lib:/lib/x86_64-linux-gnu:/usr//lib/x86_64-linux-gnu,OMP_NUM_THREADS=4,]) ...
(libos_parser.c:1658:buf_write_all) [P1:T1:python3.8] trace: ---- close(7) = 0x0
(pal_files.c:108:file_open) warning: Disallowing access to file '/bin/uname'; file is not trusted or allowed.

You only added the /bin/ directory into fs.mounts, but didn't add it into sgx.trusted_files. So please add:

sgx.trusted_files = [
  ...
  "file:/bin/",
]
thempp66 commented 2 months ago

Thanks @dimakuv for your help! When added /bin to fs.mounts and sgx.trusted_files, the warning file is not trusted or allowed. have been solved. But the error about ld command still exists. manifest: python.manifest.template.txt

log: log.txt

dimakuv commented 2 months ago

This line seems to be problematic:

(libos_parser.c:1658:buf_write_all) [P9:T241:sh] trace: ---- stat("ld", 0x2f885840) = -2

So the application wants to find ld binary, but it can't (-ENOENT = -2).

This seems to be because ld is located under /usr/bin/ld, but there is no PATH environment variable inside Gramine environment (you didn't specify it in the manifest file).

So please add smth like this in your manifest and try again:

# the subset of paths is taken from default Ubuntu, contains our desired /usr/bin/ld
loader.env.PATH= "/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
thempp66 commented 2 months ago

Thanks a lot, @dimakuv . It seems that we can use ld now when add the loader.env.PATH. However, in the same code line, it cannot find the file /tmp/tmpqqe0_3_s/sharedlib.so . Actually, I have { type = "tmpfs", path = "/tmp" } in manifest. Is there any other config that I should add to access the file under /tmp?

Traceback (most recent call last):
  File "scripts/test-ml.py", line 21, in <module>
    model.compile(X_train)
  File "/usr/local/lib/python3.8/dist-packages/concrete/ml/sklearn/base.py", line 575, in compile
    self.fhe_circuit_ = module_to_compile.compile(
  File "/usr/local/lib/python3.8/dist-packages/concrete/fhe/compilation/compiler.py", line 606, in compile
    circuit = Circuit(
  File "/usr/local/lib/python3.8/dist-packages/concrete/fhe/compilation/circuit.py", line 67, in __init__
    self.enable_fhe_execution()
  File "/usr/local/lib/python3.8/dist-packages/concrete/fhe/compilation/circuit.py", line 134, in enable_fhe_execution
    self.server = Server.create(
  File "/usr/local/lib/python3.8/dist-packages/concrete/fhe/compilation/server.py", line 220, in create
    compilation_result = support.compile(mlir, options, compilation_context)
  File "/usr/local/lib/python3.8/dist-packages/concrete/compiler/library_support.py", line 172, in compile
    self.cpp().compile(
RuntimeError: Can't emit artifacts: Command failed:ld --shared -o /tmp/tmpqqe0_3_s/sharedlib.so /tmp/tmpqqe0_3_s/program.module-0.mlir.o /usr/local/lib/python3.8/dist-packages/concrete_python.libs/libConcretelangRuntime-32c53a6a.so -rpath=/usr/local/lib/python3.8/dist-packages/concrete_python.libs --disable-new-dtags 2>&1
Code:256
ld: cannot open output file /tmp/tmpqqe0_3_s/sharedlib.so: No such file or directory

manifest: python.manifest.template.txt log: log.txt

dimakuv commented 2 months ago

Yes, it looks like your application creates a bunch of files under /tmp/ that are shared among several processes (of the same application). This sharing is not supported by tmpfs in Gramine.

So you can instead use a classic (chroot) FS mount. Smth like this (replace that tmpfs mount entry with this new entry):

fs.mounts = [
  ...
  { path = "/tmp", uri = "file:/tmp" },
]

sgx.allowed_files = [
  ...
  "file:/tmp",
]

This is absolutely insecure (as all files are simply visible to the host), but it should allow you to check the functionality of your application. Until the next problem in your Gramine experiments :)

thempp66 commented 2 months ago

Thanks again, @dimakuv . In that case, it said FileNotFoundError: [Errno 2] No usable temporary directory found in ['/tmp', '/var/tmp', '/usr/tmp', '/']. But we indeed mount the /tmp in manifest. manifest: python.manifest.template.txt log: log.txt

dimakuv commented 2 months ago

@thempp66 You added /tmp to sgx.trusted_files. These Trusted Files are read-only. I think that's what "FileNotFoundError" complains about.

You actually need to add /tmp to sgx.allowed_files instead. Allowed Files are read-write.

thempp66 commented 2 months ago

@thempp66 You added /tmp to sgx.trusted_files. These Trusted Files are read-only. I think that's what "FileNotFoundError" complains about.

You actually need to add /tmp to sgx.allowed_files instead. Allowed Files are read-write.

Thank you all so much! It works now. That's really really important to our project. By the way, sorry about my unfamiliar with gramine and it's config. I'm going to learn more about the way of using Gramine and it's limitation. Thank you!