Closed chrismartel closed 1 year ago
Hi @chrismartel , thanks for your detailed installation report!
Error when running git clone https://github.com/hnyu/seditor: fatal: destination path 'seditor' already exists and is not an empty directory. I assumed that the seditor directory must be overwritten by the clone so I removed it and cloned the repository afterwards.
Apologize for forgetting to delete the old files under 'seditor'. Yeah, you can directly remove that directory as it's outdated. I've pushed a change to the branch 'seditor_alf' to delete it. https://github.com/HorizonRobotics/alf/commit/46aba1e7c54c9fcd2ae5c7cea307cfa8a309e875
ModuleNotFoundError: No module named 'cnest'
This is due to cnest's outdated version. I've pushed a change to upgrade its version from 1.0.4 to 1.1.1. https://github.com/HorizonRobotics/alf/commit/b37aa2c3122f4b420c35ecd988a364ccb5f27c62 Can you give it a try again? It seems that the cnest error is the only error of running the example. So it should run successfully after cnest is upgraded.
If you still encounters other errors and can not run the example, let me know and I will consider building a docker image for convenience. Thanks!
Hi @hnyu , thank you for the quick reply! I tried the same steps, but I am now facing a new error:
Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 183, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/usr/lib/python3.7/runpy.py", line 109, in _get_module_details import(pkg_name) File "/home/azureuser/research/safe_rl/seditor/alf/alf/init.py", line 23, in
from . import summary File "/home/azureuser/research/safe_rl/seditor/alf/alf/summary/init.py", line 15, in from .summary_ops import * File "/home/azureuser/research/safe_rl/seditor/alf/alf/summary/summary_ops.py", line 19, in from torch.utils.tensorboard import SummaryWriter File "/home/azureuser/venv/seditor/lib/python3.7/site-packages/torch/utils/tensorboard/init.py", line 8, in from .writer import FileWriter, SummaryWriter # noqa F401 File "/home/azureuser/venv/seditor/lib/python3.7/site-packages/torch/utils/tensorboard/writer.py", line 9, in from tensorboard.compat.proto.event_pb2 import SessionLog File "/home/azureuser/venv/seditor/lib/python3.7/site-packages/tensorboard/compat/proto/event_pb2.py", line 17, in from tensorboard.compat.proto import summary_pb2 as tensorboard_dot_compat_dot_proto_dot_summarypb2 File "/home/azureuser/venv/seditor/lib/python3.7/site-packages/tensorboard/compat/proto/summary_pb2.py", line 17, in from tensorboard.compat.proto import tensor_pb2 as tensorboard_dot_compat_dot_proto_dot_tensor pb2 File "/home/azureuser/venv/seditor/lib/python3.7/site-packages/tensorboard/compat/proto/tensor_pb2.py", line 16, infrom tensorboard.compat.proto import resource_handle_pb2 as tensorboard_dot_compat_dot_proto_dot_resourcehandlepb2 File "/home/azureuser/venv/seditor/lib/python3.7/site-packages/tensorboard/compat/proto/resource_handle_pb2.py", line 16, in from tensorboard.compat.proto import tensor_shape_pb2 as tensorboard_dot_compat_dot_proto_dot_tensorshapepb2 File "/home/azureuser/venv/seditor/lib/python3.7/site-packages/tensorboard/compat/proto/tensor_shape_pb2.py", line 42, in serialized_options=None, file=DESCRIPTOR), File "/home/azureuser/venv/seditor/lib/python3.7/site-packages/google/protobuf/descriptor.py", line 561, in new _message.Message._CheckCalledFromGeneratedFile() TypeError: Descriptors cannot not be created directly. If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0. If you cannot immediately regenerate your protos, some other possible workarounds are:
- Downgrade the protobuf package to 3.20.x or lower.
- Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower). More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
I tried downgrading the protobuf plugin as mentioned here.
Then I am facing this error
Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 183, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/usr/lib/python3.7/runpy.py", line 109, in _get_module_details import(pkg_name) File "/home/azureuser/research/safe_rl/seditor/alf/alf/init.py", line 30, in
from .config_helpers import * File "/home/azureuser/research/safe_rl/seditor/alf/alf/config_helpers.py", line 27, in from alf.environments.utils import create_environment File "/home/azureuser/research/safe_rl/seditor/alf/alf/environments/utils.py", line 20, in from alf.environments import suite_gym File "/home/azureuser/research/safe_rl/seditor/alf/alf/environments/suite_gym.py", line 20, in from alf.environments import gym_wrappers, alf_wrappers, alf_gym_wrapper File "/home/azureuser/research/safe_rl/seditor/alf/alf/environments/gym_wrappers.py", line 19, in import cv2 File "/home/azureuser/venv/seditor/lib/python3.7/site-packages/cv2/init.py", line 181, in bootstrap() File "/home/azureuser/venv/seditor/lib/python3.7/site-packages/cv2/init.py", line 153, in bootstrap native_module = importlib.import_module("cv2") File "/usr/lib/python3.7/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) ImportError: libGL.so.1: cannot open shared object file: No such file or directory
As you proposed, I think building a docker image would maybe be more convenient!
@chrismartel , it seems that your opengl dependency is missing. According to this link, you can try installing
apt-get install libgl1
But meanwhile, I will try building a docker image. Stay tuned.
@hnyu , I think I was able to successfully start the training, is it expected that it is stuck at this point?
running build_ext W0615 19:40:07.553694 140619589908288 sac_safety_gym_conf.py:51] The config 'create_environment.env_name' has been configured to an immutable value of Safexp-PointGoal1-v0. The new value Safexp-PointGoal2-v0 will be ignored W0615 19:40:07.554479 140619589908288 sac_safety_gym_conf.py:96] The value of config 'Agent.rl_algorithm_cls' has been configured to <class 'alf.algorithms.sac_algorithm.SacAlgorithm'>. It is replaced by the new value <class 'alf.algorithms.sac_algorithm.SacAlgorithm'> W0615 19:40:07.554863 140619589908288 seditor_safety_gym_conf.py:26] The value of config 'Agent.rl_algorithm_cls' has been configured to <class 'alf.algorithms.sac_algorithm.SacAlgorithm'>. It is replaced by the new value <class 'alf.algorithms.seditor_algorithm.SEditorAlgorithm'> I0615 19:40:07.556214 140619589908288 parallel_environment.py:94] Spawning all processes.
I had to install the following packages to get to this point:
sudo apt-get install libgl1
sudo apt install rsync
sudo apt-get install libosmesa6-dev
sudo apt-get install patchelf
(the last two are related to Mujoco)
It seems spawning the 32 env processes. You can first decrease 'num_envs' in this file https://github.com/HorizonRobotics/alf/blob/seditor_alf/alf/examples/safety/sac/sac_safety_gym_conf.py and see if it's because of your CPU mem.
Sometimes it might take several minutes to spawn all env processes, but it won't take too long from my experience.
@hnyu it was indeed due to CPU mem. I ran it with 1 environment, and I was able to train the model. Thank you for your help!
As you proposed, I think building a docker image would maybe be more convenient!
Hi @chrismartel , I've built a docker image. See https://github.com/hnyu/seditor#docker
Hi, I tried to run the SEditor installation pipeline on Ubuntu 20.04. I went through the following steps:
Install Python 3.7
According to the ALF SEditor branch, Python3.7 seems to be required. Here are the steps I followed to install Python 3.7.
Start by updating the packages list and installing the prerequisites:
Next, add the deadsnakes PPA to your sources list:
Once the repository is enabled, install Python 3.7 with:
Python3.7 is currently supported by ALF. Note that some pip packages (e.g., pybullet) need python dev files, so make sure python3.7-dev is installed:
Install pip which is compatible with Python 3.7, 3.8, 3.9, 3.10 on Linux, Windows and MacOS by running the following script
Install virtualenv package for Python3.7
SEditor Installation
Create and activate a Python3.7 virtual environment for SEditor
Install ALF
Notes:
--use-pep517
flag is used to avoid the following warning:pip install -e . --use-pep517
Building wheel for pybullet (pyproject.toml) ... takes forever to runInstall MuJoCo version 2.1+
Install the customized Safety Gym environment
Error when running
pip install -e safety-gym
:I ignored those errors and went on with the installation
Error when running
git clone https://github.com/hnyu/seditor
:I assumed that the seditor directory must be overwritten by the clone so I removed it and cloned the repository afterwards.
Training SEditor
Create an empty directory to store the training results
Run a test on PointGoal1-v0
Error:
The cnest package is in my venv site-packages so I am not sure what is going on.
Did you ever observed the above errors when running the installation pipeline? Am I doing something wrong?
Thank you for your help!