GPU support - Githubissues

arkish commented 1 year ago

I would like to know if Gramine provides GPU support. Whether I can partition layers of a model to run inference within the SGX enclave and offload the rest to GPU. I found your publication Computation offloading to hardware accelerators in Intel® SGX and Gramine Library OS. However, Are there any documentation examples regarding this? Couldn't find any on the Gramine docs. Thanks.

kailun-qin commented 1 year ago

Hi @arkish, thanks for your interest!

Yes, there is an on-going effort around:

adding support for specifying arbitrary IOCTLs (with arbitrary request codes and corresponding IOCTL data structures) (https://github.com/gramineproject/gramine/issues/353, https://github.com/gramineproject/gramine/pull/671)
and adding limited support for POSIX shared memory (https://github.com/gramineproject/gramine/issues/757, https://github.com/gramineproject/gramine/pull/827)

which are both targeting use cases like communication with hardware accelerators (e.g. GPUs) and related w/ the paper you mentioined (https://arxiv.org/abs/2203.01813).

We intend to have this support to be integrated into our future releases (see Gramine roadmap https://github.com/orgs/gramineproject/projects/1 for details) so stay tuned!

arkish commented 1 year ago

Can I still perform GPU offloading using the details provided in the above published paper with the current latest Gramine release?

kailun-qin commented 1 year ago

Can I still perform GPU offloading using the details provided in the above published paper with the current latest Gramine release?

Sorry that I don't get this question completely.

If you wonder whether the details described in the preprint still apply to the latest Gramine (actually the latest Gramine - v1.4 hasn't included those changes, i.e. they're still WIP), yes, I think a major part of them should still apply whereas we do have some updates/changes discussed in the above issues/PR reviews. cc @dimakuv who should definitely have more details.

dimakuv commented 1 year ago

Can I still perform GPU offloading using the details provided in the above published paper with the current latest Gramine release?

I also don't understand the question exactly.

I would say yes, but with several non-trivial caveats:

You will need to rebase and apply the two PRs to the current master branch of Gramine (and on top of each other): #671, #827.
The manifest syntax described in the paper is slightly old (some fields were renamed); you can find the new manifest syntax described in #671's documentation.

Generally, if you're not a seasoned C developer, I wouldn't experiment with this currently, and instead I would wait for some time until this functionality is available in Gramine proper.

arkish commented 1 year ago

Apologies if the question wasn't clear. I wanted to know if i should wait for the future release or if I could use the details mentioned here (https://arxiv.org/abs/2203.01813), and apply those techniques to the current gramine release to conduct GPU offloading. Thanks for your response. May I know a rough estimate of when the next release with this functionality would be available?

dimakuv commented 1 year ago

@arkish I would recommend to wait for a future release.

Applying the PRs is non-trivial.

arkish commented 1 year ago

@dimakuv May I know if there is any recommended way or method to test the GPU offload after rebasing and applying the two PRs to master branch?

dimakuv commented 1 year ago

@arkish Well, the short answer is no.

arkish commented 1 year ago

@dimakuv would it be possible to test with the example under in libos/test/regression? would be great to know how to test the gpu offloading.

monavij commented 1 year ago

@arkish - What GPU are you looking at and what is your interest? Gramine will have device communication support after #671 and #827 are merged. The test is just a dummy device and does not have much to do with GPUs. This support in gramine will allow you to run a typical workload that offload to accelerators on top of Gramine with minor mods. The idea being that Software running on Gramine will need to be slightly modified to allocate untrusted shared memory for communication with GPU/accelerator. With generic IOCTL support we can run most of the stack unmodified, but it's still quite non-trivial. This idea of encrypting in SW before offloading will need to be used until HW/TEEs supports secure device DMA for efficient secure communication with devices (and devices need support for TEEs). Stay tuned for our future publication of use of Gramine with an Intel GPU.

gammelgaard52 commented 1 year ago

@arkish - What GPU are you looking at and what is your interest? Gramine will have device communication support after #671 and #827 are merged. The test is just a dummy device and does not have much to do with GPUs. This support in gramine will allow you to run a typical workload that offload to accelerators on top of Gramine with minor mods. The idea being that Software running on Gramine will need to be slightly modified to allocate untrusted shared memory for communication with GPU/accelerator. With generic IOCTL support we can run most of the stack unmodified, but it's still quite non-trivial. This idea of encrypting in SW before offloading will need to be used until HW/TEEs supports secure device DMA for efficient secure communication with devices (and devices need support for TEEs). Stay tuned for our future publication of use of Gramine with an Intel GPU.

Sorry for bumping in here. Do you have some study material on this topic, that will allow me to get started with the right keywords for further studying? "until HW/TEEs supports secure device DMA for efficient secure communication with devices"

mkow commented 1 year ago

What do you mean exactly? Communication with GPU goes through a bunch of untrusted components and thus can be sniffed / spoofed. So, until you are able to get an encrypted and authenticated channel with the GPU it will be insecure.

gammelgaard52 commented 1 year ago

That I understand. I'm simply unaware of where to get started - like a research paper that describes your statement and that fact, or some keywords to search for in a search engine.

mkow commented 1 year ago

You mean the statement that GPUs are usually plugged into PCIe? Why would you need a paper reference for this? :upside_down_face:

gammelgaard52 commented 1 year ago

I'm trying to understand where the current gap is, other than what I can find here in this GitHub issue. I understand that Nvidia has Confidential Compute in the shape of https://www.nvidia.com/en-us/data-center/solutions/confidential-computing/, where you can schedule workloads to run a secure environment. To me, it seems like, that it's possible to do it. However, I clearly see there is a challenge doing it with Gramine, based on what I read here. Is the challenge, that you cannot establish a secure connection between the CPU (Intel SGX for instance) and GPU (Nvidia Confidential Compute) - or something else?

mkow commented 1 year ago

I don't know their technology, but yeah, this is technically possible with hardware support (and impossible without it). The GPU has to either support running code in SGX-like enclaves or be able to establish a secure channel to the CPU.

Is the challenge, that you cannot establish a secure connection between the CPU (Intel SGX for instance) and GPU (Nvidia Confidential Compute) - or something else?

I don't know if these two technologies can cooperate with each other, sorry. The link you posted is just a marketing leaflet and the link to the whitepaper doesn't work for me (the PDF seems to be malformed). But maybe others from Gramine know more.

gammelgaard52 commented 1 year ago

Ok, thank you for your input.

dimakuv commented 9 hours ago

I will keep this issue open, but note that there are currently no specific plans for GPU support with Gramine-SGX.

gramineproject / gramine

GPU support #1214