gramineproject / gsc

Gramine Shielded Containers (Docker integration)
BSD 3-Clause "New" or "Revised" License
46 stars 37 forks source link

Allow to have `LD_LIBRARY_PATH` in the manifest file #166

Closed dimakuv closed 11 months ago

dimakuv commented 1 year ago

Description of the problem

Currently GSC doesn't allow to specify an LD_LIBRARY_PATH in the manifest file. This is because GSC automatically creates an LD_LIBRARY_PATH from two sources:

See these relevant parts:

The correct solution would probably be:

  1. gsc.py should rename LD_LIBRARY_PATH envvar in the user-supplied manifest to some temporary name (so that it doesn't conflict with the entrypoint manifest template)
  2. finalize_manifest.py should take this temporary name and concat it together with {{ library_paths }}, removing duplicates along the way.
  3. The order of entries is important in LD_LIBRARY_PATH, so we must think carefully what is concatenated first.
jkr0103 commented 1 year ago

Should we also support concatination of PATH and any ENV retrieved from original docker image in addition to LD_LIBRARY_PATH?

I have seen requirement for concatination of PATH env in java workload case.

We don't define LD_PRELOAD in any GSC template but retrieved from original docker image if exists, in such a case it's not possible to add LD_PRELOAD in application manifest. I have seen this case as well in PyTorch workload (DLRM, RESNET50 etc..)

jkr0103 commented 1 year ago

Please assign the issue to me.

dimakuv commented 1 year ago

I agree with concatenating LD_LIBRARY_PATH, PATH and LD_PRELOAD from the three sources, in this order:

  1. First goes the manifest string
  2. Then goes the GSC-generated string
  3. Then goes the original-Docker-image string

I think this order is the most correct one.

Also, I would like some kind of warning from GSC about this behavior, so that it is visible to casual users (not only to those who browse GSC documentation and sources).

@mkow @woju Would be good to hear your opinion. Also, should this behavior be gated by a new GSC command-line option?

woju commented 1 year ago

I don't understand the problem that is being solved. Why paths extracted from the original image (second bullet point) do not work? Can you point to an example container that fails, with an explanation why this happens?

mkow commented 1 year ago

I'm not opposed to this change, as long as I'll get the need for it explained better :) (same as woju)

envvar in the user-supplied manifest to some temporary name (so that it doesn't conflict with the entrypoint manifest template)

Btw. Why a temporary name? Can't we just merge the arrays whenever we merge some configs?

dimakuv commented 1 year ago

Btw. Why a temporary name? Can't we just merge the arrays whenever we merge some configs?

Yes, I think we can just merge. I don't remember why I came up with this temporary-rename approach, probably because the two TOML root dicts are just merged together at some point... Anyway, yes, we could merge the configs in a more careful and clean way, this was just my quick evaluation of the approach.

jkr0103 commented 1 year ago

Explanation on the need of concatenation for:

  1. LD_LIBRARY_PATH: GSC template entrypoint.manifest.template add LD_LIBRARY_PATH. In openvino case we wanted to add /ovms/lib path to LD_LIBRARY_PATH which we could not do as GSC template above already have LD_LIBRARY_PATH and we cannot have two keys with same name in the manifest, then we had to add all the libraries to LD_PRELOAD.
  2. PATH : We had to add java bin path to PATH env in application manifest for JDK8 to work but could not do as PATH is already added in entrypoint.common.template.manifest and we cannot have two keys in the manifest.
  3. LD_PRELOAD or any XYZ env in original docker image: We added libgomp to LD_PRELOAD in application manifest for PyTorch DLRM workload and GSC build failed with Duplicate key error. We were surprised by this error as we could not see LD_PRELOAD defined anywhere in GSC templates. After some debugging figured out LD_PRELOAD is also fetched from original docker image. Now we had LD_PRELOAD coming from two places one from original docker image and other from application manifest. Similar case can happen with any other XYZ env also defined in original docker image and manifest want to concat or override.

These are some examples but there could be more which customer might be handling by modifying GSC code or might face in future. In some cases, there could be need to concat or override. To handle this, application manifest env value should be added first than GSC template values and at the end values from original docker image with some warning during GSC build as suggested by Dmitrii.

dimakuv commented 1 year ago

I can add a couple more Gramine-specific examples:

  1. LD_LIBRARY_PATH -- applications may rely on the OpenMP library. There are two flavors of the OpenMP library: libgomp.so (classic GNU gcc implementation) and its drop-in replacement libiomp.so (Intel-optimized implementation). Some apps -- I think OpenVINO -- prefer to load libiomp.so over libgomp.so if they find the former in the library paths. Also, for Gramine, libgomp.so is bad because it uses raw SYSCALL instructions, whereas libiomp.so is good because it uses Glibc syscall wrappers. So we have a situation where the original Docker image wants to use libgomp.so, and our Gramine Docker image wants to use libiomp.so -- for perf reasons. The way to allow both scenarios in the same Docker image is to specify loader.env.LD_LIBRARY_PATH = "/path/to/libiomp/" in the GSC manifest. So the final Docker image will contain loader.env.LD_LIBRARY_PATH = "/path/to/libiomp/:/path/gramine/runtime/:classic-paths". This is currently not allowed.

  2. LD_PRELOAD -- the same use case as above. Imagine that the original Docker image already uses LD_PRELOAD=/some/irrelevant/lib, and we also want to add LD_PRELOAD=/libiomp.so for Gramine purposes. Ideally we want in the final Gramine Docker image this: loader.env.LD_PRELOAD=/libiomp.so:/some/irrelevant/lib. This is currently not allowed.

  3. LD_PRELOAD again -- some memory allocators are better for Gramine than others. E.g. mimalloc and jemalloc are known to provide better performance under Gramine than the classic dlmalloc. A typical way to use these drop-in replacement libs is to preload them. But again, the original Docker image may already use LD_PRELOAD=/some/irrelevant/lib, and we also want to add LD_PRELOAD=/libmimalloc.so for Gramine purposes. Ideally we want in the final Gramine Docker image this: loader.env.LD_PRELOAD=/libmimalloc.so:/some/irrelevant/lib. This is currently not allowed.

  4. PATH -- I don't know any good examples, why Gramine would want an additional path to what is already in the original Docker image... But I could imagine that Gramine would e.g. want to use some tools/binaries that are tailored for SGX environments. For example, maybe busybox instead of dash as your shell: the final Docker image would use dash for normal apps and busybox for the Gramine app. This could be achieved by doing loader.env.PATH=/path/to/busybox:classic-paths.

mkow commented 1 year ago

@jkr0103: But why these envs weren't set in the source Docker image? How did this even work?

From the issue description:

GSC automatically creates an LD_LIBRARY_PATH from [...] The library paths extracted from the original Docker image

So, if the Docker image works then it should also work in Gramine without changes.

Only @dimakuv's examples make sense to me and I think they justify adding this feature.