run-ai / genv

GPU environment and cluster management with LLM support
https://www.genv.dev
GNU Affero General Public License v3.0
445 stars 19 forks source link

[BUG] GPU memory error when reenter the container #63

Closed incomingflyingbrick closed 1 month ago

incomingflyingbrick commented 1 month ago

Hi, I noticed there a bug for container toolkit, I started a container with 512MB of GRAM using genv-docker with python:3 image, when I enter the container using docker exec -it bash, then type nvidia-smi, the memory amount is not 512MB, instead it shows the full memory amount.

Steps to reproduce the bug

  1. genv-docker run -it -d --rm --gpus 1 --gpu-memory 512mi --entrypoint bash python:3
  2. docker exec -it bash
  3. when inside container type, "nvidia-smi"

Screenshot 2024-05-15 at 20 05 31

razrotenberg commented 1 month ago

hi @incomingflyingbrick! thanks again for this great feedback.

I was able to reproduce it easily thanks to your detailed instructions.

the issue in general is that environment variables are not being set properly by the Genv container runtime when using docker exec (while it works with docker run).

I will look into it and update!

thanks again!

razrotenberg commented 1 month ago

hi @incomingflyingbrick! fortunately, it was a quick fix. I fixed it and released 1.4.3.

closing the issue, pls reopen it if this does not solve the issue for you! and thanks again

razrotenberg commented 1 month ago

(@incomingflyingbrick make sure to reinstall the container toolkit)

incomingflyingbrick commented 1 month ago

I just tested it out with the new release 1.4.3, and everything works perfectly now! Thank you so much for the quick fix and your hard work.

Really appreciate it! 🙌