utiasDSL / gym-pybullet-drones

PyBullet Gymnasium environments for single and multi-agent reinforcement learning of quadcopter control
https://utiasDSL.github.io/gym-pybullet-drones/
MIT License
1.23k stars 357 forks source link

LeaderFollower, leader only has height axis freedom. #126

Open jackvice opened 1 year ago

jackvice commented 1 year ago

Hi, I am playing with leader behavior and I can get it to hover at different heights but it does't want to leave 0, 0 for x and y? I'm guessing it may have been x,y fixed for faster demo training? I'm changing reward in line 84 of LeaderFollowerAviary.py to get new behaviors. e.g. rewards[0] = -1 * np.linalg.norm(np.array([0, .25, 0.5]) - states[0, 0:3])**2

JacopoPan commented 1 year ago

Can you tell me what scripts/commands/ you are running? The default examples are, as you say, going to use the "1-D" action space (i.e., forcing the force control on all propellers to be the same) that greatly simplify/accelerates the solution of the problem.

jackvice commented 1 year ago

I made this change in LeaderFollowerAviary.py line 84: rewards[0] = -1 * np.linalg.norm(np.array([0, .25, 0.5]) - states[0, 0:3])**2 and then ran gym-pybullet-drones/experiments/learning/multiagent.py I thought I tried setting ACT to ActionType.RPM. I'll have to double check. Do [ActionType.RPM, ActionType.DYN, ActionType.VEL] all allow 3D flight? And thank you!

JacopoPan commented 1 year ago

That would be the intended behaviour, double check that your code is going through lines 197-198 of BaseMultiAgentAviary.py:

if self.ACT_TYPE == ActionType.RPM: 
    rpm[int(k),:] = np.array(self.HOVER_RPM * (1+0.05*v))

and 174-175 of multiagency.py

elif ARGS.act in [ActionType.RPM, ActionType.DYN, ActionType.VEL]:
    ACTION_VEC_SIZE = 4

These should lead to different actions on different propellers and movement in the 3D space

jackvice commented 1 year ago

Setting actiontype to rpm and "timesteps_total": 500000 and both drones crashed when I ran policy using the test_multiagent.py so I set "timesteps_total": 5000000 and got the following error running on CPU or GPU: 2022-11-09 15:50:29,427 ERROR trial_runner.py:958 -- Trial PPO_this-aviary-v0_924f1_00000: Error processing event. Traceback (most recent call last): File "/home/jack/anaconda3/envs/drones/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 924, in _process_trial results = self.trial_executor.fetch_result(trial) File "/home/jack/anaconda3/envs/drones/lib/python3.8/site-packages/ray/tune/ray_trial_executor.py", line 787, in fetch_result result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT) File "/home/jack/anaconda3/envs/drones/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper return func(*args, **kwargs) File "/home/jack/anaconda3/envs/drones/lib/python3.8/site-packages/ray/worker.py", line 1713, in get raise value.as_instanceof_cause() ray.exceptions.RayTaskError(ValueError): ray::PPO.train() (pid=926055, ip=192.168.1.110, repr=PPO) File "/home/jack/anaconda3/envs/drones/lib/python3.8/site-packages/ray/rllib/agents/ppo/ppo_torch_policy.py", line 82, in loss curr_action_dist = dist_class(logits, model) File "/home/jack/anaconda3/envs/drones/lib/python3.8/site-packages/ray/rllib/models/torch/torch_action_dist.py", line 185, in init self.dist = torch.distributions.normal.Normal(mean, torch.exp(log_std)) File "/home/jack/anaconda3/envs/drones/lib/python3.8/site-packages/torch/distributions/normal.py", line 50, in init super(Normal, self).init(batch_shape, validate_args=validate_args) File "/home/jack/anaconda3/envs/drones/lib/python3.8/site-packages/torch/distributions/distribution.py", line 55, in init raise ValueError( ValueError: Expected parameter loc (Tensor of shape (128, 4)) of distribution Normal(loc: torch.Size([128, 4]), scale: torch.Size([128, 4])) to satisfy the constraint Real(), but found invalid values: tensor([[nan, nan, nan, nan], [nan, nan, nan, nan], [nan, nan, nan, nan], [nan, nan, nan, nan], [nan, nan, nan, nan], [nan, nan, nan, nan], [nan, nan, nan, nan], [nan, nan, nan, nan], [nan, nan, nan, nan], ...

I am trying actiontype vel and dyn now. And I checked both: if self.ACT_TYPE == ActionType.RPM: rpm[int(k),:] = np.array(self.HOVER_RPM * (1+0.05*v)) and elif ARGS.act in [ActionType.RPM, ActionType.DYN, ActionType.VEL]: ACTION_VEC_SIZE = 4

JacopoPan commented 1 year ago

@jackvice it is easier to read terminal output if you put it in a code block with some formatting highlighting. It seems to me that all issues are coming directly from ray/torch (and resulting in a structure full of nans), what versions of those packages you have installed?

2022-11-09 15:50:29,427 ERROR trial_runner.py:958 -- Trial PPO_this-aviary-v0_924f1_00000: Error processing event.
Traceback (most recent call last):
File "/home/jack/anaconda3/envs/drones/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 924, in _process_trial
results = self.trial_executor.fetch_result(trial)
File "/home/jack/anaconda3/envs/drones/lib/python3.8/site-packages/ray/tune/ray_trial_executor.py", line 787, in fetch_result
result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
File "/home/jack/anaconda3/envs/drones/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/home/jack/anaconda3/envs/drones/lib/python3.8/site-packages/ray/worker.py", line 1713, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::PPO.train() (pid=926055, ip=192.168.1.110, repr=PPO)
File "/home/jack/anaconda3/envs/drones/lib/python3.8/site-packages/ray/rllib/agents/ppo/ppo_torch_policy.py", line 82, in loss
curr_action_dist = dist_class(logits, model)
File "/home/jack/anaconda3/envs/drones/lib/python3.8/site-packages/ray/rllib/models/torch/torch_action_dist.py", line 185, in init
self.dist = torch.distributions.normal.Normal(mean, torch.exp(log_std))
File "/home/jack/anaconda3/envs/drones/lib/python3.8/site-packages/torch/distributions/normal.py", line 50, in init
super(Normal, self).init(batch_shape, validate_args=validate_args)
File "/home/jack/anaconda3/envs/drones/lib/python3.8/site-packages/torch/distributions/distribution.py", line 55, in init
raise ValueError(
ValueError: Expected parameter loc (Tensor of shape (128, 4)) of distribution Normal(loc: torch.Size([128, 4]), scale: torch.Size([128, 4])) to satisfy the constraint Real(), but found invalid values:
tensor([[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan],
jackvice commented 1 year ago

I've tested on one machine that has ray 1.9.0 and torch version 1.10.1 and on another with ray 1.9.0 and torch 1.11.0 and policy still diverging . :( Thanks for any help.

JacopoPan commented 1 year ago

I'd need to try and replicate the issue: can you list OS, CPU, (GPU/driver if appropriate), packages in your conda environment and a minimal example (e.g. by creating a PR) that would consistently lead to that error on your machine?

jackvice commented 1 year ago

To produce the error, I set "timesteps_total": 10000000, at line 283 of multiagent.py and then run python multiagent.py --act 'rpm'. The setup for the two machines I have tried are listed below. Tried on GPU and CPU only on both machines. (I tried adding the conda packages as code but it lost all white space chars) Machine 1: Ubuntu 20.04.5 on Intel i7-10700F with GTX 1080 and RTX 2080 GPU's. Nvidia 460.106.00, CUDA 11.2 Conda packages: _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_gnu conda-forge async-timeout 4.0.2 pypi_0 pypi attrs 22.1.0 pypi_0 pypi blas 1.0 mkl
brotlipy 0.7.0 py38h0a891b7_1004 conda-forge bzip2 1.0.8 h7f98852_4 conda-forge ca-certificates 2022.9.24 ha878542_0 conda-forge certifi 2022.9.24 pyhd8ed1ab_0 conda-forge cffi 1.15.1 py38h4a40e3a_1 conda-forge charset-normalizer 2.1.1 pyhd8ed1ab_0 conda-forge click 8.1.3 pypi_0 pypi cloudpickle 2.2.0 pypi_0 pypi contourpy 1.0.5 pypi_0 pypi cryptography 37.0.1 py38h9ce1e76_0
cudatoolkit 10.2.89 h713d32c_10 conda-forge cycler 0.10.0 pypi_0 pypi deprecated 1.2.13 pypi_0 pypi ffmpeg 4.3 hf484d3e_0 pytorch filelock 3.8.0 pypi_0 pypi fonttools 4.38.0 pypi_0 pypi freetype 2.12.1 hca18f0e_0 conda-forge gmp 6.2.1 h58526e2_0 conda-forge gnutls 3.6.13 h85f3911_1 conda-forge gputil 1.4.0 pyh9f0ad1d_0 conda-forge gym 0.21.0 pypi_0 pypi gym-pybullet-drones 0.0.3 pypi_0 pypi idna 3.4 pyhd8ed1ab_0 conda-forge importlib-metadata 4.13.0 pypi_0 pypi intel-openmp 2021.4.0 h06a4308_3561
jpeg 9e h166bdaf_2 conda-forge kiwisolver 1.4.4 pypi_0 pypi lame 3.100 h166bdaf_1003 conda-forge lcms2 2.12 hddcbb42_0 conda-forge ld_impl_linux-64 2.39 hc81fddc_0 conda-forge lerc 4.0.0 h27087fc_0 conda-forge libdeflate 1.14 h166bdaf_0 conda-forge libffi 3.4.2 h7f98852_5 conda-forge libgcc-ng 12.2.0 h65d4601_18 conda-forge libgomp 12.2.0 h65d4601_18 conda-forge libiconv 1.17 h166bdaf_0 conda-forge libnsl 2.0.0 h7f98852_0 conda-forge libpng 1.6.38 h753d276_0 conda-forge libsqlite 3.39.4 h753d276_0 conda-forge libstdcxx-ng 12.2.0 h46fd767_18 conda-forge libtiff 4.4.0 h55922b4_4 conda-forge libuuid 2.32.1 h7f98852_1000 conda-forge libuv 1.44.2 h166bdaf_0 conda-forge libwebp-base 1.2.4 h166bdaf_0 conda-forge libxcb 1.13 h7f98852_1004 conda-forge libzlib 1.2.13 h166bdaf_4 conda-forge lz4 4.0.2 pypi_0 pypi matplotlib 3.6.1 pypi_0 pypi mkl 2021.4.0 h06a4308_640
mkl-service 2.4.0 py38h95df7f1_0 conda-forge mkl_fft 1.3.1 py38h8666266_1 conda-forge mkl_random 1.2.2 py38h1abd341_0 conda-forge ncurses 6.3 h27087fc_1 conda-forge nettle 3.6 he412f7d_0 conda-forge numpy 1.23.3 py38h14f4228_0
numpy-base 1.23.3 py38h31eccc5_0
oauthlib 3.2.2 pypi_0 pypi openh264 2.1.1 h780b84a_0 conda-forge openjpeg 2.5.0 h7d73246_1 conda-forge openssl 3.0.5 h166bdaf_2 conda-forge pillow 9.2.0 py38ha3b2c9c_2 conda-forge pip 22.3 pypi_0 pypi pthread-stubs 0.4 h36c2ea0_1001 conda-forge pyasn1 0.4.8 pypi_0 pypi pyasn1-modules 0.2.8 pypi_0 pypi pybullet 3.2.5 pypi_0 pypi pycparser 2.21 pyhd8ed1ab_0 conda-forge pyopenssl 22.0.0 pyhd8ed1ab_1 conda-forge pyparsing 3.0.9 pypi_0 pypi pysocks 1.7.1 pyha2e5f31_6 conda-forge python 3.8.13 ha86cf86_0_cpython conda-forge python_abi 3.8 2_cp38 conda-forge pytorch 1.10.1 py3.8_cuda10.2_cudnn7.6.5_0 pytorch pytorch-mutex 1.0 cuda pytorch pyyaml 6.0 pypi_0 pypi ray 1.9.0 pypi_0 pypi readline 8.1.2 h0f457ee_0 conda-forge redis 4.3.4 pypi_0 pypi requests 2.28.1 pyhd8ed1ab_1 conda-forge setuptools 65.5.0 pyhd8ed1ab_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge sqlite 3.39.4 h4ff8645_0 conda-forge stable-baselines3 1.6.2 pypi_0 pypi tk 8.6.12 h27826a3_0 conda-forge torchaudio 0.10.1 py38_cu102 pytorch torchvision 0.11.2 py38_cu102 pytorch typing_extensions 4.4.0 pyha770c72_0 conda-forge urllib3 1.26.11 pyhd8ed1ab_0 conda-forge wheel 0.37.1 pyhd8ed1ab_0 conda-forge xorg-libxau 1.0.9 h7f98852_0 conda-forge xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge xz 5.2.6 h166bdaf_0 conda-forge zlib 1.2.13 h166bdaf_4 conda-forge zstd 1.5.2 h6239696_4 conda-forge

Machine 2: Ubuntu 20.04.5 on Intel Xeon W-1195M with RTX A5000 GPU, Nvidia version 495.29.05, CUDA 11.5 Conda Packages: _libgcc_mutex 0.1 main _openmp_mutex 5.1 1_gnu absl-py 1.2.0 pypi_0 pypi async-timeout 4.0.2 pypi_0 pypi attrs 22.1.0 pypi_0 pypi blas 1.0 mkl brotlipy 0.7.0 py38h27cfd23_1003 bzip2 1.0.8 h7b6447c_0 ca-certificates 2022.07.19 h06a4308_0 cachetools 5.2.0 pypi_0 pypi certifi 2022.9.24 py38h06a4308_0 cffi 1.15.1 py38h74dc2b5_0 charset-normalizer 2.1.1 pypi_0 pypi click 8.1.3 pypi_0 pypi cloudpickle 2.2.0 pypi_0 pypi contourpy 1.0.5 pypi_0 pypi cryptography 37.0.1 py38h9ce1e76_0 cudatoolkit 11.3.1 h2bc3f7f_2 cycler 0.10.0 pypi_0 pypi deprecated 1.2.13 pypi_0 pypi dm-tree 0.1.7 pypi_0 pypi ffmpeg 4.3 hf484d3e_0 pytorch fonttools 4.37.4 pypi_0 pypi freetype 2.11.0 h70c0345_0 giflib 5.2.1 h7b6447c_0 gmp 6.2.1 h295c915_3 gnutls 3.6.15 he1e5248_0 google-auth 2.12.0 pypi_0 pypi google-auth-oauthlib 0.4.6 pypi_0 pypi grpcio 1.49.1 pypi_0 pypi gym 0.21.0 pypi_0 pypi gym-pybullet-drones 0.0.3 pypi_0 pypi idna 3.4 py38h06a4308_0 imageio 2.22.1 pypi_0 pypi importlib-metadata 5.0.0 pypi_0 pypi importlib-resources 5.9.0 pypi_0 pypi intel-openmp 2021.4.0 h06a4308_3561 jpeg 9e h7f8727e_0 jsonschema 4.16.0 pypi_0 pypi kiwisolver 1.4.4 pypi_0 pypi lame 3.100 h7b6447c_0 lcms2 2.12 h3be6417_0 ld_impl_linux-64 2.38 h1181459_1 lerc 3.0 h295c915_0 libdeflate 1.8 h7f8727e_5 libffi 3.3 he6710b0_2 libgcc-ng 11.2.0 h1234567_1 libgomp 11.2.0 h1234567_1 libiconv 1.16 h7f8727e_2 libidn2 2.3.2 h7f8727e_0 libpng 1.6.37 hbc83047_0 libstdcxx-ng 11.2.0 h1234567_1 libtasn1 4.16.0 h27cfd23_0 libtiff 4.4.0 hecacb30_0 libunistring 0.9.10 h27cfd23_0 libuv 1.40.0 h7b6447c_0 libwebp 1.2.4 h11a3e52_0 libwebp-base 1.2.4 h5eee18b_0 lz4 4.0.2 pypi_0 pypi lz4-c 1.9.3 h295c915_1 markdown 3.4.1 pypi_0 pypi markupsafe 2.1.1 pypi_0 pypi matplotlib 3.6.0 pypi_0 pypi mkl 2021.4.0 h06a4308_640 mkl-service 2.4.0 py38h7f8727e_0 mkl_fft 1.3.1 py38hd3c417c_0 mkl_random 1.2.2 py38h51133e4_0 msgpack 1.0.4 pypi_0 pypi ncurses 6.3 h5eee18b_3 nettle 3.7.3 hbbd107a_1 networkx 2.8.7 pypi_0 pypi numpy 1.23.3 py38h14f4228_0 numpy-base 1.23.3 py38h31eccc5_0 oauthlib 3.2.1 pypi_0 pypi openh264 2.1.1 h4ff587b_0 openssl 1.1.1q h7f8727e_0 packaging 21.3 pypi_0 pypi pandas 1.5.0 pypi_0 pypi pillow 9.2.0 pypi_0 pypi pip 22.2.2 py38h06a4308_0 pkgutil-resolve-name 1.3.10 pypi_0 pypi protobuf 3.19.6 pypi_0 pypi pyasn1 0.4.8 pypi_0 pypi pyasn1-modules 0.2.8 pypi_0 pypi pybullet 3.2.5 pypi_0 pypi pycparser 2.21 pyhd3eb1b0_0 pyopenssl 22.0.0 pyhd3eb1b0_0 pyparsing 3.0.9 pypi_0 pypi pyrsistent 0.18.1 pypi_0 pypi pysocks 1.7.1 py38h06a4308_0 python 3.8.13 h12debd9_0 python-dateutil 2.8.2 pypi_0 pypi pytorch 1.11.0 py3.8_cuda11.3_cudnn8.2.0_0 pytorch pytorch-mutex 1.0 cuda pytorch pytz 2022.4 pypi_0 pypi pywavelets 1.4.1 pypi_0 pypi pyyaml 6.0 pypi_0 pypi ray 1.9.0 pypi_0 pypi readline 8.1.2 h7f8727e_1 redis 4.3.4 pypi_0 pypi requests 2.28.1 py38h06a4308_0 requests-oauthlib 1.3.1 pypi_0 pypi rsa 4.9 pypi_0 pypi scikit-image 0.19.3 pypi_0 pypi scipy 1.9.1 pypi_0 pypi setuptools 65.4.1 pypi_0 pypi six 1.16.0 pyhd3eb1b0_1 sqlite 3.39.3 h5082296_0 stable-baselines3 1.6.1 pypi_0 pypi tabulate 0.9.0 pypi_0 pypi tensorboard 2.10.1 pypi_0 pypi tensorboard-data-server 0.6.1 pypi_0 pypi tensorboard-plugin-wit 1.8.1 pypi_0 pypi tensorboardx 2.5.1 pypi_0 pypi tifffile 2022.8.12 pypi_0 pypi tk 8.6.12 h1ccaba5_0 torch 1.12.1 pypi_0 pypi torchaudio 0.11.0 py38_cu113 pytorch torchvision 0.12.0 py38_cu113 pytorch typing_extensions 4.3.0 py38h06a4308_0 urllib3 1.26.12 py38h06a4308_0 werkzeug 2.2.2 pypi_0 pypi wheel 0.37.1 pyhd3eb1b0_0 wrapt 1.14.1 pypi_0 pypi xz 5.2.6 h5eee18b_0 zipp 3.8.1 pypi_0 pypi zlib 1.2.12 h5eee18b_3 zstd 1.5.2 ha4553b6_0

jackvice commented 1 year ago

also tested with Nvidia Driver Version: 515.65.01 CUDA Version: 11.7 and the policy still diverges. is this a bug?

JacopoPan commented 1 year ago

I'm sorry, I haven't had the time to work on this: does the default example (as mentioned in the paper) work as expected or that also results in NaNs with the package versions you have currently installed?

jackvice commented 1 year ago

the default multi agent example from the paper does work including with an increase to "timesteps_total": 1200000. I did a fresh pull from the repo today and when I pass --act 'rpm', it still fails (NaN) after about an hour of training . The only change to the code I made is the "timesteps_total": 1200000 on line 283 of multiagent.py

JacopoPan commented 1 year ago

Can you take the learning agent out of the loop and running that same multi-agent environment (e.g. line 315 of test_multiagent.py, where the action from the policies is set) with a constant action (it should be a dictionary of size 4 arrays, with all zeros the drones should hover) to help me verify if it's a problem with the environment or the agent?

jackvice commented 1 year ago

After training with --act rpm, I set line 315 to action = {0: np.zeros(4), 1: np.zeros(4)} and both drones appear to hover correctly.

JacopoPan commented 1 year ago

I see, could you, at the same time, print out what is in np.hstack([action[1], obs[1], obs[0]]) and returned by ..compute_single_action in temp[0], temp[1] to understand if there is something odd with the inputs or outputs of the networks (although I suspect the NaN's appear in one of the internal layers)?

jackvice commented 1 year ago

Hi, the NaN divergence happens during training and running test_multiagent on the save dir fails with: Traceback (most recent call last): File "test_multiagent.py", line 254, in <module> with open(ARGS.exp+'/checkpoint.txt', 'r+') as f: FileNotFoundError: [Errno 2] No such file or directory: 'results/save-leaderfollower-2-cc-kin-rpm 11.26.2022_16.29.04//checkpoint.txt' The training NaN tensor 64 x 4 as show below: Failure # 1 (occurred at 2022-11-26_19-52-28) Traceback (most recent call last): File "/home/jack/anaconda3/envs/drones/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 924, in _process_trial results = self.trial_executor.fetch_result(trial) File "/home/jack/anaconda3/envs/drones/lib/python3.8/site-packages/ray/tune/ray_trial_executor.py", line 787, in fetch_result result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT) File "/home/jack/anaconda3/envs/drones/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper return func(*args, **kwargs) File "/home/jack/anaconda3/envs/drones/lib/python3.8/site-packages/ray/worker.py", line 1713, in get raise value.as_instanceof_cause() ray.exceptions.RayTaskError(ValueError): ray::PPO.train() (pid=828921, ip=192.168.1.110, repr=PPO) File "/home/jack/anaconda3/envs/drones/lib/python3.8/site-packages/ray/rllib/agents/ppo/ppo_torch_policy.py", line 82, in loss curr_action_dist = dist_class(logits, model) File "/home/jack/anaconda3/envs/drones/lib/python3.8/site-packages/ray/rllib/models/torch/torch_action_dist.py", line 185, in __init__ self.dist = torch.distributions.normal.Normal(mean, torch.exp(log_std)) File "/home/jack/anaconda3/envs/drones/lib/python3.8/site-packages/torch/distributions/normal.py", line 50, in __init__ super(Normal, self).__init__(batch_shape, validate_args=validate_args) File "/home/jack/anaconda3/envs/drones/lib/python3.8/site-packages/torch/distributions/distribution.py", line 55, in __init__ raise ValueError( ValueError: Expected parameter loc (Tensor of shape (64, 4)) of distribution Normal(loc: torch.Size([64, 4]), scale: torch.Size([64, 4])) to satisfy the constraint Real(), but found invalid values: tensor([[nan, nan, nan, nan], tensor([[nan, nan, nan, nan], ...

thx. All help is much appreciated.

JacopoPan commented 1 year ago

This seems to come from torch when it finds a tensor with lists of 4 NaN's, not the code in this repo itself. But I wonder if these NaN's are wrongly logged actions coming from the gym-pybullet-drones environment.

Can you try to print out (or use a debugger) what action goes in and what obs is returned by method .step() (in BaseAviary) as training proceeds?

jackvice commented 1 year ago

Running python3 multiagent.py --env 'leaderfollower', the last set of actions and observation are below. The policy diverges and spits out [nan] after 3 hours. These are the last ten actions and observations from .step() in BaseAviary.

`{0: array([-0.21322441], dtype=float32), 1: array([-1.], dtype=float32)}, {0: array([ 0. , 0. , 0.10552978, 0. , -0. , 0. , 0. , 0. , -0.00090858, 0. , 0. , 0. ]), 1: array([ 0.01058667, 0.01058667, 0.10189757, 0. , -0. , 0. , 0. , 0. , -0.00389826, 0. , 0. , 0. ])}

{0: array([0.39939213], dtype=float32), 1: array([1.], dtype=float32)}, {0: array([ 0. , 0. , 0.10553902, 0. , -0. ,

  1. , 0. , 0. , 0.00183648, 0. ,
  2. , 0. ]), 1: array([ 0.01058667, 0.01058667, 0.10190118, 0. , -0. ,
  3. , 0. , 0. , 0.00307836, 0. , 0. , 0. ])}

{0: array([-0.10729766], dtype=float32), 1: array([-1.], dtype=float32)}, {0: array([ 0. , 0. , 0.1055565 , 0. , -0. , 0. , 0. , 0. , 0.00110692, 0. ,

  1. , 0. ]), 1: array([ 0.01058667, 0.01058667, 0.10188988, 0. , -0. ,
  2. , 0. , 0. , -0.00355743, 0. , 0. , 0. ])}

{0: array([0.06007123], dtype=float32), 1: array([1.], dtype=float32)}, {0: array([ 0. , 0. , 0.1055734 , 0. , -0. ,

  1. , 0. , 0. , 0.00151529, 0. ,
  2. , 0. ]), 1: array([ 0.01058667, 0.01058667, 0.10189774, 0. , -0. ,
  3. , 0. , 0. , 0.0034189 , 0. , 0. , 0. ])}

{0: array([-0.17268103], dtype=float32), 1: array([-1.], dtype=float32)}, {0: array([ 0. , 0. , 0.10558356, 0. , -0. , 0. , 0. , 0. , 0.00034429, 0. ,

  1. , 0. ]), 1: array([ 0.01058667, 0.01058667, 0.1018907 , 0. , -0. ,
  2. , 0. , 0. , -0.00321716, 0. , 0. , 0. ])}

{0: array([-0.04746491], dtype=float32), 1: array([1.], dtype=float32)}, {0: array([ 0.00000000e+00, 0.00000000e+00, 1.05585441e-01, 0.00000000e+00, -0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 2.14683282e-05, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]), 1: array([ 0.01058667, 0.01058667, 0.10190281, 0. , -0. ,

  1. , 0. , 0. , 0.00375888, 0. , 0. , 0. ])}

{0: array([-0.063905], dtype=float32), 1: array([-1.], dtype=float32)}, {0: array([ 0. , 0. , 0.10558245, 0. , -0. ,

  1. , 0. , 0. , -0.00041262, 0. ,
  2. , 0. ]), 1: array([ 0.01058667, 0.01058667, 0.10190002, 0. , -0. ,
  3. , 0. , 0. , -0.00287747, 0. , 0. , 0. ])}

{0: array([0.11206794], dtype=float32), 1: array([1.], dtype=float32)}, {0: array([ 0. , 0. , 0.10558303, 0. , -0. ,

  1. , 0. , 0. , 0.0003523 , 0. ,
  2. , 0. ]), 1: array([ 0.01058667, 0.01058667, 0.10191638, 0. , -0. ,
  3. , 0. , 0. , 0.00409828, 0. , 0. , 0. ])}

{0: array([-0.21531594], dtype=float32), 1: array([-1.], dtype=float32)}, {0: array([ 0. , 0. , 0.10557651, 0. , -0. , 0. , 0. , 0. , -0.00110497, 0. ,

  1. , 0. ]), 1: array([ 0.01058667, 0.01058667, 0.10191783, 0. , -0. ,
  2. , 0. , 0. , -0.00253836, 0. , 0. , 0. ])}

{0: array([-0.00944871], dtype=float32), 1: array([1.], dtype=float32)}, {0: array([ 0. , 0. , 0.10556222, 0. , -0. , 0. , 0. , 0. , -0.00116831, 0. ,

  1. , 0. ]), 1: array([ 0.01058667, 0.01058667, 0.10193842, 0. , -0. ,
  2. , 0. , 0. , 0.00443711, 0. , 0. , 0. ])} `
zhaohubo commented 1 year ago

hi, did you solve this problem?

jackvice commented 1 year ago

No I didn't. is it still a problem in new versions? I switched to Isaac gym.