openai / spinningup

An educational resource to help anyone learn deep reinforcement learning.
https://spinningup.openai.com/
MIT License
10.08k stars 2.21k forks source link

ExperimentGrid fail, FileNotFound error #343

Closed alanballard closed 3 years ago

alanballard commented 3 years ago

Hello. I'm just getting started with SpinningUp and have encountered an issue when I try to run ExperimentGrid. Full disclosure: I'm running Windows and I followed the instructions linked on the spinning up installation website (installed ubuntu, miniconda, WSL, etc.) but it seems there's lots of places to make errors in that process.

I'm using Anaconda and this is how I created the environment that I'm currently working in. conda create -n NRP_DRL python=3.7 conda activate NRP_DRL conda install tensorflow=1.15.0 conda install -c conda-forge ipykernel openai gym pyglet swig pytorch=1.3.1

I thought I had installed Miniconda3 Linux 64bit for Python 3.7 in Ubuntu, but <$python --version> returns command not found and <$python3 --version> reveals 3.8.10. Not sure if that's important or not.

My problem, short version: I can run SpinningUp's test program without issue but cannot run their test ExperimentGrid code. It seems the code can't find something(?) but I have no idea what or why. My problem, longer version: I can run the SpinningUp's test program without issue:

(NRP_DRL) C:\...\python -m spinup.run ppo --hid "[32,32]" --env LunarLander-v2 --exp_name installtest --gamma 0.999

I can also use the playback tools to view the text output as well as the visual replay of the experiment

However, when I try to run the sample ExperimentGrid script, I get an error: (NRP_DRL) C:\...\python bench_ppo_cartpole.py

Traceback (most recent call last): File "c:\users\prime\spinningup\spinup\utils\run_entrypoint.py", line 11, in thunk() File "c:\users\prime\spinningup\spinup\utils\run_utils.py", line 159, in thunk_plus mpi_fork(num_cpu) File "c:\users\prime\spinningup\spinup\utils\mpi_tools.py", line 35, in mpi_fork subprocess.check_call(args, env=env) File "C:\Users\Prime\anaconda3\envs\NRP_DRL\lib\subprocess.py", line 358, in check_call retcode = call(*popenargs, *kwargs) File "C:\Users\Prime\anaconda3\envs\NRP_DRL\lib\subprocess.py", line 339, in call with Popen(popenargs, **kwargs) as p: File "C:\Users\Prime\anaconda3\envs\NRP_DRL\lib\subprocess.py", line 800, in init restore_signals, start_new_session) File "C:\Users\Prime\anaconda3\envs\NRP_DRL\lib\subprocess.py", line 1207, in _execute_child startupinfo) FileNotFoundError: [WinError 2] The system cannot find the file specified

There appears to have been an error in your experiment. Check the traceback above to see what actually went wrong. The traceback below, included for completeness (but probably not useful for diagnosing the error), shows the stack leading up to the experiment launch.

Traceback (most recent call last): File "c:\users\prime\spinningup\spinup\examples\pytorch\bench_ppo_cartpole.py", line 19, in eg.run(ppo_pytorch, num_cpu=args.cpu) File "c:\users\prime\spinningup\spinup\utils\run_utils.py", line 546, in run data_dir=data_dir, datestamp=datestamp, **var) File "c:\users\prime\spinningup\spinup\utils\run_utils.py", line 171, in call_experiment subprocess.check_call(cmd, env=os.environ) File "C:\Users\Prime\anaconda3\envs\NRP_DRL\lib\subprocess.py", line 363, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['C:\Users\Prime\anaconda3\envs\NRP_DRL\python.exe', 'c:\users\prime\spinningup\spinup\utils\run_entrypoint.py', 'eJydVM9rFDEUznTGOm61FQuKPw6CHraHDkXxItXLgpfFIupxJWQn2Z1hskmcJNUKguCPrvDw0ujFi/5F/ku+zBa7CEJxhplN3st773vf+2bfZV9/ZaS74EopteemLhspiqV1gHU6qaWkE69KV2sVjkK/IrBBZ6wR1DZi4UIzWi/Qsa+lqxV1B0agDfKB5uJ53HwMT0N/SLo7HW4OHvFkvtKstjt8hRNJPuOOJ1fJZ3JIDpPJCk959iFH25ldEj0PiEvm6fskIfNsgpYvuOLkGQn9PciF2qeKzUQYkurccpHePPlB5uRnggf3PgbIImqEsgV3y/sjb0VrR6atZ2JkTa1UrabedEv88a6WdtR6RbtVYQ6wnV3JZmPOHobh9wEJ1Rqk04NZ+BS2HLYH90qGVInXRsScyhW7UqPJPixc5VVDjfT2xPYnF6zGBiYquFBtQj4zNZ3otolpq4vQO4kNw2+DdbKSJ/G+lKymeYrBzSvWTm2As8rPaGl8gDNdSDhEgNXmpxCxDdO3oQ9rlBpWNmwqKA1wftFp0fWHzdGOxOi5vOwp/nDQnUE5dGf+i0F/iFgikrNTqcdIA26qa3/ViwQ4rdFZXTsKFnIuJsxLZ8MeZLwuHQbBOqrU+lbQfSa9sOFF6GNenMjagLXuiZZie38nUmt0WSFzPdiwThhLcTi0M4bHPzbgHCvpMYER1vmq5lwoaus3mHN4EzXTYyj8fdaJH6473ZZVoVQx09xLYYtlb/acqQpl7yGzQnCUI1yQejrFiks1eto74x3ldRug+TeLnDk2MkZvmwO3PRaqrGiJrZlFaxSR3r1DHVY81SFqkY0ctbn4UuD2aYKC98MMbhxPh8mpjlNccIDx8QmQxvdRELC64CRUW5B1RaobkHJd4tiu0KU/FWr9+Jg+HBusMaW06yiMBEH+0jO5AHnrFB8UTqh5daIQ754WvwF9otOQ']' returned non-zero exit status 1.

If I run the contents of bench_ppo_cartpole.py directly in Jupyter Notebook, I get a slightly different error:

The code:

from spinup.utils.run_utils import ExperimentGrid from spinup import ppo_pytorch import torch

eg = ExperimentGrid(name='ppo-pyt-bench') eg.add('env_name', 'CartPole-v0', '', True) eg.add('seed', [10*i for i in range(3)]) #default args.num_runs=3 eg.add('epochs', 10) eg.add('steps_per_epoch', 4000) eg.add('ac_kwargs:hidden_sizes', [(32,), (64,64)], 'hid') eg.add('ac_kwargs:activation', [torch.nn.Tanh, torch.nn.ReLU], '') eg.run(ppo_pytorch, num_cpu=4) #default args.cpu=4

The error:

CalledProcessError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_12820/1988011519.py in 17 eg.add('ac_kwargs:hidden_sizes', [(32,), (64,64)], 'hid') 18 eg.add('ac_kwargs:activation', [torch.nn.Tanh, torch.nn.ReLU], '') ---> 19 eg.run(ppo_pytorch, num_cpu=4)

c:\users\prime\spinningup\spinup\utils\run_utils.py in run(self, thunk, num_cpu, data_dir, datestamp) 544 545 call_experiment(expname, thunk, num_cpu=num_cpu, --> 546 data_dir=data_dir, datestamp=datestamp, **var) 547 548

c:\users\prime\spinningup\spinup\utils\run_utils.py in call_experiment(exp_name, thunk, seed, num_cpu, data_dir, datestamp, *kwargs) 169 cmd = [sys.executable if sys.executable else 'python', entrypoint, encoded_thunk] 170 try: --> 171 subprocess.check_call(cmd, env=os.environ) 172 except CalledProcessError: 173 err_msg = '\n'3 + '='*DIV_LINE_WIDTH + '\n' + dedent("""

~\anaconda3\envs\NRP_DRL\lib\subprocess.py in check_call(*popenargs, **kwargs) 361 if cmd is None: 362 cmd = popenargs[0] --> 363 raise CalledProcessError(retcode, cmd) 364 return 0 365

CalledProcessError: Command '['C:\Users\Prime\anaconda3\envs\NRP_DRL\python.exe', 'c:\users\prime\spinningup\spinup\utils\run_entrypoint.py', 'eJydVM9rFDEUznTGOm61FQuKPw6CHraHDkXxItXLgpfFIupxJWQn2Z1hskmcJNUKguCPrvDw0ujFi/5F/ku+zBa7CEJxhplN3st773vf+2bfZV9/ZaS74EopteemLhspiqV1gHU6qaWkE69KV2sVjkK/IrBBZ6wR1DZi4UIzWi/Qsa+lqxV1B0agDfKB5uJ53HwMT0N/SLo7HW4OHvFkvtKstjt8hRNJPuOOJ1fJZ3JIDpPJCk959iFH25ldEj0PiEvm6fskIfNsgpYvuOLkGQn9PciF2qeKzUQYkurccpHePPlB5uRnggf3PgbIImqEsgV3y/sjb0VrR6atZ2JkTa1UrabedEv88a6WdtR6RbtVYQ6wnV3JZmPOHobh9wEJ1Rqk04NZ+BS2HLYH90qGVInXRsScyhW7UqPJPixc5VVDjfT2xPYnF6zGBiYquFBtQj4zNZ3otolpq4vQO4kNw2+DdbKSJ/G+lKymeYrBzSvWTm2As8rPaGl8gDNdSDhEgNXmpxCxDdO3oQ9rlBpWNmwqKA1wftFp0fWHzdGOxOi5vOwp/nDQnUE5dGf+i0F/iFgikrNTqcdIA26qa3/ViwQ4rdFZXTsKFnIuJsxLZ8MeZLwuHQbBOqrU+lbQfSa9sOFF6GNenMjagLXuiZZie38nUmt0WSFzPdiwThhLcTi0M4bHPzbgHCvpMYER1vmq5lwoaus3mHN4EzXTYyj8fdaJH6473ZZVoVQx09xLYYtlb/acqQpl7yGzQnCUI1yQejrFiks1eto74x3ldRug+TeLnDk2MkZvmwO3PRaqrGiJrZlFaxSR3r1DHVY81SFqkY0ctbn4UuD2aYKC98MMbhxPh8mpjlNccIDx8QmQxvdRELC64CRUW5B1RaobkHJd4tiu0KU/FWr9+Jg+HBusMaW06yiMBEH+0jO5AHnrFB8UTqh5daIQ754WvwF9otOQ']' returned non-zero exit status 1.

​ I have no idea what any of this means, nor how I might address it. I'm a little stumped as SpinningUp's test program works, but their sample ExperimentGrid does not. I'm hoping that someone can help find and fix whatever is causing this. Thank you.

alanballard commented 3 years ago

I found the problem. I misunderstood what we were trying to accomplish by linking Windows and Ubuntu. I thought we were supposed to use Windows-installed python/Anaconda on the front-end and somehow - magically - the miniconda installed on Ubuntu would make spinningup work. So I had some stuff installed on Windows and some in Ubuntu, when all of it needed to be in Ubuntu. Naturally, since everything was not where it was supposed to be, I was getting "file not found"-type messages.

I wasn't able to make the instructors for linking Windows and Ubuntu work properly though, so I followed these instructions to create a virtual Ubuntu desktop that I can start from my Windows installation.

  1. Steps 1 (WSL2) and 2 (Xming X) from here: https://github.com/openai/spinningup/issues/23 I don't know if Xming X was actually necessary, but I installed it so I include it for completeness. I installed Ubuntu for Windows as part of the WSL2 instructions

  2. Everything from "WSL2 installation window interface display" downward here: https://blog.csdn.net/bornfree5511/article/details/108632513 Let Chrome translate the entire page for you

  3. And everything up to "sound realization" here: https://zhuanlan.zhihu.com/p/150555651 Again, let Chrome translate the entire page for you

3 is actually referenced as part of the process in #2, not a separate thing that has to be done.

Now I have an xlaunch shortcut on my Windows10 desktop that creates a blank virtual desktop and running $ startxfce4 in Ubuntu for Windows will display Ubuntu in that virtual desktop. From within Ubuntu, I followed the SpinningUp's instructions on their webpage for installing on Linux distributions and was able to install Anaconda and all of the required packages for SpinningUp.

I can now run the test program ( _python -m spinup.run ppo --hid "[32,32]" --env LunarLander-v2 --expname installtest --gamma 0.999 ) and test ExperimentGrid ( _python spinup/examples/pytorch/bench_ppocartpole.py ) provided by SpinningUp without issue, and I can view the training replay.

So, basically everything has to be installed and run within Ubuntu for Windows and the WSL2/XmingX (step #1)/VcXsrv (step #2) is just a way of setting up a GUI within Windows for that Ubuntu installation (along with Anaconda for Linux, etc.).

I hope this helps someone in the future.