hnyu / seditor

Code release for the paper "Towards Safe Reinforcement Learning with a Safety Editor Policy", Yu et al., arXiv 2022
13 stars 1 forks source link

Failed to Run Installation Pipeline #2

Closed chrismartel closed 1 year ago

chrismartel commented 1 year ago

Hi, I tried to run the SEditor installation pipeline on Ubuntu 20.04. I went through the following steps:

Install Python 3.7

According to the ALF SEditor branch, Python3.7 seems to be required. Here are the steps I followed to install Python 3.7.

  1. Start by updating the packages list and installing the prerequisites:

    sudo apt update
    sudo apt upgrade
    sudo apt install software-properties-common
  2. Next, add the deadsnakes PPA to your sources list:

    sudo add-apt-repository ppa:deadsnakes/ppa
  3. Once the repository is enabled, install Python 3.7 with:

    sudo apt install python3.7

Python3.7 is currently supported by ALF. Note that some pip packages (e.g., pybullet) need python dev files, so make sure python3.7-dev is installed:

sudo apt install -y python3.7-dev
  1. Install pip which is compatible with Python 3.7, 3.8, 3.9, 3.10 on Linux, Windows and MacOS by running the following script

    sudo apt install python3-pip
    pip install --upgrade pip
  2. Install virtualenv package for Python3.7

    sudo apt install python3.7-venv

SEditor Installation

  1. Create and activate a Python3.7 virtual environment for SEditor

    python3.7 -m venv ~/venv/seditor
    source ~/venv/seditor/bin/activate
  2. Install ALF

    git clone https://github.com/HorizonRobotics/alf
    cd alf
    git checkout origin/seditor_alf -B seditor
    pip install -e . --use-pep517

Notes:

  1. Install MuJoCo version 2.1+

    pip install mujoco
  2. Install the customized Safety Gym environment

    git clone https://github.com/hnyu/safety-gym.git
    pip install -e safety-gym
    python -c "import safety_gym" # test if correctly installed

    Error when running pip install -e safety-gym:

    ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. alf 0.0.6 requires gym==0.12.5, but you have gym 0.15.4 which is incompatible. alf 0.0.6 requires pillow==7.2.0, but you have pillow 9.5.0 which is incompatible.

I ignored those errors and went on with the installation

  1. After the installations, clone this repo under ALF:
    cd ~/research/safe_rl/seditor/alf/alf/examples/safety
    git clone https://github.com/hnyu/seditor

Error when running git clone https://github.com/hnyu/seditor:

fatal: destination path 'seditor' already exists and is not an empty directory.

I assumed that the seditor directory must be overwritten by the clone so I removed it and cloned the repository afterwards.

  1. And move the file seditor_algorithm.py under ALF
    cp ~/research/safe_rl/seditor/alf//alf/examples/safety/seditor/seditor_algorithm.py ~/research/safe_rl/seditor/alf//alf/algorithms/

Training SEditor

  1. Create an empty directory to store the training results

  2. Run a test on PointGoal1-v0

    cd ~/research/safe_rl/seditor/alf//alf/examples
    python3.7 -m alf.bin.train --root_dir=~/research/safe_rl/seditor/results --conf safety/seditor/seditor_safety_gym_conf.py --conf_param="create_environment.env_name='Safexp-PointGoal1-v0'"

Error:

Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 183, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/usr/lib/python3.7/runpy.py", line 109, in _get_module_details import(pkg_name) File "/home/azureuser/research/safe_rl/seditor/alf/alf/init.py", line 17, in from . import metrics File "/home/azureuser/research/safe_rl/seditor/alf/alf/metrics/init.py", line 16, in from .metrics import File "/home/azureuser/research/safe_rl/seditor/alf/alf/metrics/metrics.py", line 22, in import alf.utils.data_buffer as db File "/home/azureuser/research/safe_rl/seditor/alf/alf/utils/data_buffer.py", line 24, in from alf.nest import get_nest_batch_size File "/home/azureuser/research/safe_rl/seditor/alf/alf/nest/init.py", line 15, in from .nest import File "/home/azureuser/research/safe_rl/seditor/alf/alf/nest/nest.py", line 18, in import cnest File "/home/azureuser/venv/seditor/lib/python3.7/site-packages/cnest/init.py", line 1, in from _cnest import * ModuleNotFoundError: No module named '_cnest'

The cnest package is in my venv site-packages so I am not sure what is going on.

Did you ever observed the above errors when running the installation pipeline? Am I doing something wrong?

Thank you for your help!

hnyu commented 1 year ago

Hi @chrismartel , thanks for your detailed installation report!

Error when running git clone https://github.com/hnyu/seditor: fatal: destination path 'seditor' already exists and is not an empty directory. I assumed that the seditor directory must be overwritten by the clone so I removed it and cloned the repository afterwards.

Apologize for forgetting to delete the old files under 'seditor'. Yeah, you can directly remove that directory as it's outdated. I've pushed a change to the branch 'seditor_alf' to delete it. https://github.com/HorizonRobotics/alf/commit/46aba1e7c54c9fcd2ae5c7cea307cfa8a309e875

ModuleNotFoundError: No module named 'cnest'

This is due to cnest's outdated version. I've pushed a change to upgrade its version from 1.0.4 to 1.1.1. https://github.com/HorizonRobotics/alf/commit/b37aa2c3122f4b420c35ecd988a364ccb5f27c62 Can you give it a try again? It seems that the cnest error is the only error of running the example. So it should run successfully after cnest is upgraded.

If you still encounters other errors and can not run the example, let me know and I will consider building a docker image for convenience. Thanks!

chrismartel commented 1 year ago

Hi @hnyu , thank you for the quick reply! I tried the same steps, but I am now facing a new error:

Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 183, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/usr/lib/python3.7/runpy.py", line 109, in _get_module_details import(pkg_name) File "/home/azureuser/research/safe_rl/seditor/alf/alf/init.py", line 23, in from . import summary File "/home/azureuser/research/safe_rl/seditor/alf/alf/summary/init.py", line 15, in from .summary_ops import * File "/home/azureuser/research/safe_rl/seditor/alf/alf/summary/summary_ops.py", line 19, in from torch.utils.tensorboard import SummaryWriter File "/home/azureuser/venv/seditor/lib/python3.7/site-packages/torch/utils/tensorboard/init.py", line 8, in from .writer import FileWriter, SummaryWriter # noqa F401 File "/home/azureuser/venv/seditor/lib/python3.7/site-packages/torch/utils/tensorboard/writer.py", line 9, in from tensorboard.compat.proto.event_pb2 import SessionLog File "/home/azureuser/venv/seditor/lib/python3.7/site-packages/tensorboard/compat/proto/event_pb2.py", line 17, in from tensorboard.compat.proto import summary_pb2 as tensorboard_dot_compat_dot_proto_dot_summarypb2 File "/home/azureuser/venv/seditor/lib/python3.7/site-packages/tensorboard/compat/proto/summary_pb2.py", line 17, in from tensorboard.compat.proto import tensor_pb2 as tensorboard_dot_compat_dot_proto_dot_tensorpb2 File "/home/azureuser/venv/seditor/lib/python3.7/site-packages/tensorboard/compat/proto/tensor_pb2.py", line 16, in from tensorboard.compat.proto import resource_handle_pb2 as tensorboard_dot_compat_dot_proto_dot_resourcehandlepb2 File "/home/azureuser/venv/seditor/lib/python3.7/site-packages/tensorboard/compat/proto/resource_handle_pb2.py", line 16, in from tensorboard.compat.proto import tensor_shape_pb2 as tensorboard_dot_compat_dot_proto_dot_tensorshapepb2 File "/home/azureuser/venv/seditor/lib/python3.7/site-packages/tensorboard/compat/proto/tensor_shape_pb2.py", line 42, in serialized_options=None, file=DESCRIPTOR), File "/home/azureuser/venv/seditor/lib/python3.7/site-packages/google/protobuf/descriptor.py", line 561, in new _message.Message._CheckCalledFromGeneratedFile() TypeError: Descriptors cannot not be created directly. If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0. If you cannot immediately regenerate your protos, some other possible workarounds are:

  1. Downgrade the protobuf package to 3.20.x or lower.
  2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower). More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

I tried downgrading the protobuf plugin as mentioned here.

Then I am facing this error

Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 183, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/usr/lib/python3.7/runpy.py", line 109, in _get_module_details import(pkg_name) File "/home/azureuser/research/safe_rl/seditor/alf/alf/init.py", line 30, in from .config_helpers import * File "/home/azureuser/research/safe_rl/seditor/alf/alf/config_helpers.py", line 27, in from alf.environments.utils import create_environment File "/home/azureuser/research/safe_rl/seditor/alf/alf/environments/utils.py", line 20, in from alf.environments import suite_gym File "/home/azureuser/research/safe_rl/seditor/alf/alf/environments/suite_gym.py", line 20, in from alf.environments import gym_wrappers, alf_wrappers, alf_gym_wrapper File "/home/azureuser/research/safe_rl/seditor/alf/alf/environments/gym_wrappers.py", line 19, in import cv2 File "/home/azureuser/venv/seditor/lib/python3.7/site-packages/cv2/init.py", line 181, in bootstrap() File "/home/azureuser/venv/seditor/lib/python3.7/site-packages/cv2/init.py", line 153, in bootstrap native_module = importlib.import_module("cv2") File "/usr/lib/python3.7/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) ImportError: libGL.so.1: cannot open shared object file: No such file or directory

As you proposed, I think building a docker image would maybe be more convenient!

hnyu commented 1 year ago

@chrismartel , it seems that your opengl dependency is missing. According to this link, you can try installing

apt-get install libgl1

But meanwhile, I will try building a docker image. Stay tuned.

chrismartel commented 1 year ago

@hnyu , I think I was able to successfully start the training, is it expected that it is stuck at this point?

running build_ext W0615 19:40:07.553694 140619589908288 sac_safety_gym_conf.py:51] The config 'create_environment.env_name' has been configured to an immutable value of Safexp-PointGoal1-v0. The new value Safexp-PointGoal2-v0 will be ignored W0615 19:40:07.554479 140619589908288 sac_safety_gym_conf.py:96] The value of config 'Agent.rl_algorithm_cls' has been configured to <class 'alf.algorithms.sac_algorithm.SacAlgorithm'>. It is replaced by the new value <class 'alf.algorithms.sac_algorithm.SacAlgorithm'> W0615 19:40:07.554863 140619589908288 seditor_safety_gym_conf.py:26] The value of config 'Agent.rl_algorithm_cls' has been configured to <class 'alf.algorithms.sac_algorithm.SacAlgorithm'>. It is replaced by the new value <class 'alf.algorithms.seditor_algorithm.SEditorAlgorithm'> I0615 19:40:07.556214 140619589908288 parallel_environment.py:94] Spawning all processes.

I had to install the following packages to get to this point:

sudo apt-get install libgl1 sudo apt install rsync sudo apt-get install libosmesa6-dev sudo apt-get install patchelf (the last two are related to Mujoco)

hnyu commented 1 year ago

It seems spawning the 32 env processes. You can first decrease 'num_envs' in this file https://github.com/HorizonRobotics/alf/blob/seditor_alf/alf/examples/safety/sac/sac_safety_gym_conf.py and see if it's because of your CPU mem.

Sometimes it might take several minutes to spawn all env processes, but it won't take too long from my experience.

chrismartel commented 1 year ago

@hnyu it was indeed due to CPU mem. I ran it with 1 environment, and I was able to train the model. Thank you for your help!

hnyu commented 1 year ago

As you proposed, I think building a docker image would maybe be more convenient!

Hi @chrismartel , I've built a docker image. See https://github.com/hnyu/seditor#docker