Problem starting the environment

SilviaZirino commented 4 years ago

Hello!

I'm trying to run the code in Ubuntu 18.04, but I'm not able to open vrep.

In particular, when I run the following:

sudo docker run --env="DISPLAY" --volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" vrep_ee_reach

The vrep logo appears for one second, then it disappears and in the terminal I get:

QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root' No XVisualInfo for format QSurfaceFormat(version 2.0, options QFlags<QSurfaceFormat::FormatOption>(), depthBufferSize -1, redBufferSize 1, greenBufferSize 1, blueBufferSize 1, alphaBufferSize -1, stencilBufferSize -1, samples -1, swapBehavior QSurfaceFormat::SwapBehavior(SingleBuffer), swapInterval 1, profile QSurfaceFormat::OpenGLContextProfile(NoProfile)) Falling back to using screens root_visual. No XVisualInfo for format QSurfaceFormat(version 2.0, options QFlags<QSurfaceFormat::FormatOption>(), depthBufferSize 0, redBufferSize 1, greenBufferSize 1, blueBufferSize 1, alphaBufferSize -1, stencilBufferSize 0, samples -1, swapBehavior QSurfaceFormat::SwapBehavior(SingleBuffer), swapInterval -1, profile QSurfaceFormat::OpenGLContextProfile(NoProfile)) Falling back to using screens root_visual. Could not initialize GLX Using the default Lua library. Loaded the video compression library. Add-on script 'vrepAddOnScript-addOnScriptDemo.lua' was loaded. /app/V-REP/vrep.sh: line 33: 13 Aborted (core dumped) "$dirname/$appname" "${PARAMETERS[@]}"

I followed the passages in the README file, including the GUI one ("using X server --> the simple way" and then writing xhost +local:root), but maybe because I'm just a beginner I'm not able to solve this problem. Do you know why this is happening?

Many thanks!

bango123 commented 4 years ago

Hmm.. This does look like an error of the GUI not being enabled. "The simple way" needs to be done every time you turn on/off your computer. I know it's silly, but can you run the xhost +local:root command from whatever terminal you are running the python code from?

SilviaZirino commented 4 years ago

Hello, thank you for answering me!

So, I have been writing xhost +local:root every time I need because otherwise I get another error:

QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root' No protocol specified QXcbConnection: Could not connect to display :1 /app/V-REP/vrep.sh: line 33: 13 Aborted (core dumped) "$dirname/$appname" "${PARAMETERS[@]}"

As for your question, for now I'm just running the code that creates the container using Ubuntu terminal. Should I also run particular python files to make it all work? Or are you asking me if I am authorized to run the command xhost +local:root ?

Also, I tried to run the file sample_env_initialization.ipynb using jupyter notebook but it gives me some errors (I attach the files).

screen 1 screen 2

Thanks again for your help.

bango123 commented 4 years ago

That error looks like nvidia-docker was not properly installed. Could you check it is by running the following in a separate terminal: docker run --runtime nvidia nvidia/cuda:10.0-base nvidia-smi

This is from step 2 of the README

SilviaZirino commented 4 years ago

Yes you are right, I get an error also regarding nvidia runtime:

docker: Error response from daemon: Unknown runtime specified nvidia. See 'docker run --help'.

That is why I was trying to run vrep without specifying the runtime (I thought that maybe the two errors were separated).

I don't really know what I did wrong during nvidia-docker installation, so I will list the passages:

I cloned the nvidia-docker git in my pc.
I installed nvidia driver following the package manager installation for Ubuntu (section 3.6) and performing the mandatory post installation actions.
I followed the passages in the Quickstart for Ubuntu 18.04.

I tried solving this problem (without success) performing some attempts:

restart Docker daemon with: sudo systemctl daemon -reload sudo system ctl restart docker
register the new runtime to Docker daemon: sudo dockerd --add-runtime nvidia=nvidia-container-runtime (but I'm not sure that the nvidia path is correct)
another possible solution I found was to install nvidia-docker2, but I didn't do that because I read that the use of its packages is deprecated in Docker version 19.03.

I'm so sorry for the long message, thanks again.

bango123 commented 4 years ago

You shouldn't need to clone the repo. The quickstart guide in their readme should be sufficient. Which are the following steps:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit sudo systemctl restart docker

Outside of these steps + updating your nvidia drivers, I am not sure how to help debug. This seems to be an issue with installation of their software, can you post their asking for help?

SilviaZirino commented 4 years ago

Just to be sure I understood correctly your request, should I post here the steps to install the package manager for nvidia so that we can gain a better overview to solve the issue?

bango123 commented 4 years ago

Yes! That would be great if you could post here your findings.

SilviaZirino commented 4 years ago

Ok so firstly I run the following commands:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin

sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600

wget http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb

sudo dpkg -i cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb

sudo apt-key add /var/cuda-repo-10-2-local-10.2.89-440.33.01/7fa2af80.pub

sudo apt-get update

sudo apt-get -y install cuda

Then I run post-installation actions:

export PATH=/usr/local/cuda-10.2/bin:/usr/local/cuda-10.2/NsightCompute-2019.1${PATH:+:${PATH}}

systemctl status nvidia-persistenced --> I verified that NVIDIA Persistence Daemon is active

sudo cp /lib/udev/rules.d/40-vm-hotadd.rules /etc/udev/rules.d

sudo sed -i '/SUBSYSTEM=="memory", ACTION=="add"/d' /etc/udev/rules.d/40-vm-hotadd.rules

After that I did a reboot of the system and I followed the Quickstart instructions.

nndei commented 4 years ago

Dear @bango123, it seems like the website pages at https://nvidia.github.io/nvidia-docker/$distribution do not contain any resources, whatever the distribution you try. Might that be the issue?

E.g., on a remote Debian machine I am working on, $distribution outputs debian. https://nvidia.github.io/nvidia-docker/debian outputs:

Unsupported distribution! # Check https://nvidia.github.io/nvidia-docker

bango123 commented 4 years ago

I do not know how to debug these issues. I have verified that the nvidia environment still works on my system after uninstall/reinstall using the commands from their GitHub/quickstart guide. Can you post on their Github to ask for help?

SilviaZirino commented 4 years ago

Ok, thank you very much for your help!

SilviaZirino commented 4 years ago

Hello,

I solved the problem of running the following: sudo docker run --env="DISPLAY" --volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" vrep_ee_reach

with a two step approach:

I installed nvidia-container-runtime
then I registered nvidia container runtime with the following commands sudo mkdir -p /etc/systemd/system/docker.service.d sudo tee /etc/systemd/system/docker.service.d/override.conf <<EOF [Service] ExecStart= ExecStart=/usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtime EOF sudo systemctl daemon-reload sudo systemctl restart docker

However, now I'm facing problems running sample_env_initialization.ipynb (it gives permission errors) and nothing in the internet seems to help me. I attach the screens.

Thanks in advance!

screen 1 screen 2 screen 3 screen 4

SilviaZirino commented 4 years ago

Hello again, I solved the issue enabling the root account.

ucsdarclab / dVRL

Problem starting the environment #9