keiohta / tf2rl

TensorFlow2 Reinforcement Learning
MIT License
467 stars 103 forks source link

Implement AIRL #36

Open keiohta opened 5 years ago

keiohta commented 5 years ago

Learning Robust Rewards with Adversarial Inverse Reinforcement Learning

keiohta commented 4 years ago

Test code

# Generate trajectories
$ python examples/run_sac.py --env-name HalfCheetah-v2 --save-test-path --test-interval 50000 --gpu -1
$ ls results
20191220T185529.974847_SAC_

$ python examples/run_airl_sac.py --env-name HalfCheetah-v2 --test-interval 10000 --gpu -1 --expert-path-dir results/20191220T185529.974847_SAC_
haoyu-x commented 4 years ago

hi @keiohta when I run $ python ~/tf2rl-master/examples/run_gaifo_ddpg.py --env-name=HalfCheetah-v2 --expert-path-dir ~/GAIL/results/20200619T013740.036943SAC --gpu -1 --dir-suffix GAIfO

run_gaifo_ddpg.py: error: unrecognized arguments: --gpu -1

can you help me ? Thank you!

keiohta commented 4 years ago

@haoyu-x Hi! Thanks for reporting the bug. I fixed the error on this commit, so can you try on the latest master branch again?

haoyu-x commented 4 years ago

should I still use the same command suggested in issue 67? https://github.com/keiohta/tf2rl/issues/67

when I run python ~/tf2rl-master/examples/run_gail_ddpg.py --env-name=HalfCheetah-v2 --expert-path-dir ~/GAIL/results/20200619T013740.036943SAC --gpu -1 --dir-suffix GAIL same error.

On Sat, Jun 27, 2020 at 7:52 PM Kei Ohta notifications@github.com wrote:

@haoyu-x https://github.com/haoyu-x Hi! Thanks for reporting the bug. I fixed the error on this commit https://github.com/keiohta/tf2rl/commit/ab675d0e8f7061910e8f44d00daf72c69c72db6a, so can you try on the latest master branch again?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keiohta/tf2rl/issues/36#issuecomment-650550289, or unsubscribe https://github.com/notifications/unsubscribe-auth/APACPZW5GAOYIOYFBJIKW23RYXMOLANCNFSM4HSDDXZQ .

keiohta commented 4 years ago

Yeah, did you update the codes?

haoyu-x commented 4 years ago

yes. I updated. Can you run gail and gaifo on your computer?

On Sat, Jun 27, 2020 at 9:13 PM Kei Ohta notifications@github.com wrote:

Yeah, did you update the codes?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keiohta/tf2rl/issues/36#issuecomment-650559655, or unsubscribe https://github.com/notifications/unsubscribe-auth/APACPZUBM7ZIBNGBRSREIM3RYXV6FANCNFSM4HSDDXZQ .

keiohta commented 4 years ago

At least I resolved the error of --gpu. Let me check whether full code runs.

haoyu-x commented 4 years ago

Is there any other method to run gail and gaifo instead of the command line?

On Sat, Jun 27, 2020 at 9:15 PM Kei Ohta notifications@github.com wrote:

At least I resolved the error of --gpu. Let me check whether full code runs.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keiohta/tf2rl/issues/36#issuecomment-650559894, or unsubscribe https://github.com/notifications/unsubscribe-auth/APACPZSL5LEL2TMTZBCXKITRYXWGTANCNFSM4HSDDXZQ .

keiohta commented 4 years ago

I confirmed the script runs on my machine. Can you provide me with the full error message?

$ python examples/run_sac.py --env-name=HalfCheetah-v2 --save-test-path --test-interval=50000 --max-steps 300000
$ ls results
20200627T221712.423081_SAC_
$ find results/20200627T221712.423081_SAC_/ -name *.pkl
results/20200627T221712.423081_SAC_/step_00050000_epi_02_return_02744.1677.pkl
results/20200627T221712.423081_SAC_/step_00050000_epi_04_return_02701.9388.pkl
results/20200627T221712.423081_SAC_/step_00050000_epi_00_return_03121.5797.pkl
results/20200627T221712.423081_SAC_/step_00050000_epi_01_return_02784.6256.pkl
results/20200627T221712.423081_SAC_/step_00050000_epi_03_return_02752.4279.pkl

$ python examples/run_gail_ddpg.py --env-name=HalfCheetah-v2 --expert-path-dir results/20200627T221712.423081_SAC_/ --gpu -1
...
22:23:48.107 [INFO] (irl_trainer.py:74) Total Epi:    19 Steps:   19000 Episode Steps:  1000 Return:  1174.4017 FPS: 118.79
22:23:56.162 [INFO] (irl_trainer.py:74) Total Epi:    20 Steps:   20000 Episode Steps:  1000 Return:  1889.9691 FPS: 124.15
22:23:57.861 [INFO] (irl_trainer.py:118) Evaluation Total Steps:   20000 Average Reward  2278.0820 over  5 episodes
haoyu-x commented 4 years ago

[image: Screenshot from 2020-06-27 21-38-20.png]

On Sat, Jun 27, 2020 at 9:34 PM Kei Ohta notifications@github.com wrote:

I confirmed the script runs on my machine. Can you provide me with the full error message?

$ python examples/run_sac.py --env-name=HalfCheetah-v2 --save-test-path --test-interval=50000 --max-steps 300000 $ ls results 20200627T221712.423081SAC $ find results/20200627T221712.423081SAC/ -name *.pkl results/20200627T221712.423081SAC/step_00050000_epi_02_return_02744.1677.pkl results/20200627T221712.423081SAC/step_00050000_epi_04_return_02701.9388.pkl results/20200627T221712.423081SAC/step_00050000_epi_00_return_03121.5797.pkl results/20200627T221712.423081SAC/step_00050000_epi_01_return_02784.6256.pkl results/20200627T221712.423081SAC/step_00050000_epi_03_return_02752.4279.pkl

$ python examples/run_gail_ddpg.py --env-name=HalfCheetah-v2 --expert-path-dir results/20200627T221712.423081SAC/ --gpu -1 ... 22:23:48.107 [INFO] (irl_trainer.py:74) Total Epi: 19 Steps: 19000 Episode Steps: 1000 Return: 1174.4017 FPS: 118.79 22:23:56.162 [INFO] (irl_trainer.py:74) Total Epi: 20 Steps: 20000 Episode Steps: 1000 Return: 1889.9691 FPS: 124.15 22:23:57.861 [INFO] (irl_trainer.py:118) Evaluation Total Steps: 20000 Average Reward 2278.0820 over 5 episodes

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keiohta/tf2rl/issues/36#issuecomment-650562023, or unsubscribe https://github.com/notifications/unsubscribe-auth/APACPZQZU6PRI4UA6GBI2TLRYXYPZANCNFSM4HSDDXZQ .

keiohta commented 4 years ago

Oh, I assumed you installed tf2rl on developer mode... I have not reflected my change on PyPI, so I do now.

haoyu-x commented 4 years ago

sure. Please let me know what I should do after your change, Thank you a lot!

On Sat, Jun 27, 2020 at 9:39 PM Kei Ohta notifications@github.com wrote:

Oh, I assumed you installed tf2rl on developer mode... I have not reflected my change on PyPI, so I do now.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keiohta/tf2rl/issues/36#issuecomment-650562564, or unsubscribe https://github.com/notifications/unsubscribe-auth/APACPZUVWIQGBHFOU5EK4ETRYXZBZANCNFSM4HSDDXZQ .

keiohta commented 4 years ago

Now, you can get the latest codes through PyPI. Can you try following?

# Update tf2rl
$ pip install -U tf2rl
# Make sure the version is 0.1.14
$ pip list | grep tf2rl

# Run your script
$ python ~/tf2rl-master/examples/run_gaifo_ddpg.py --env-name=HalfCheetah-v2 --expert-path-dir ~/GAIL/results/20200619T013740.036943_SAC_ --gpu -1 --dir-suffix GAIfO

By the way, it seems that your path: ~/tf2rl-master suggests that you did not install tf2rl using git clone but you just download zip file, didn't you? Anyway above command can detect the version, so please let me know if you still encounter the same problem.

haoyu-x commented 4 years ago

problem fixed. But encountering another issue. :(

On Sat, Jun 27, 2020 at 9:48 PM Kei Ohta notifications@github.com wrote:

Now, you can get the latest codes through PyPI. Can you try following?

Update tf2rl

$ pip install -U tf2rl

Make sure the version is 0.1.14

$ pip list | grep tf2rl

Run your script

$ python ~/tf2rl-master/examples/run_gaifo_ddpg.py --env-name=HalfCheetah-v2 --expert-path-dir ~/GAIL/results/20200619T013740.036943SAC --gpu -1 --dir-suffix GAIfO

By the way, it seems that your path: ~/tf2rl-master suggests that you did not install tf2rl using git clone but you just download zip file, didn't you? Anyway above command can detect the version, so please let me know if you still encounter the same problem.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keiohta/tf2rl/issues/36#issuecomment-650563507, or unsubscribe https://github.com/notifications/unsubscribe-auth/APACPZTP44TSSKYFXIEKWF3RYX2EDANCNFSM4HSDDXZQ .

haoyu-x commented 4 years ago

[image: Screenshot from 2020-06-27 21-54-05.png]

On Sat, Jun 27, 2020 at 9:53 PM Haoyu Xiong haoyux@berkeley.edu wrote:

problem fixed. But encountering another issue. :(

On Sat, Jun 27, 2020 at 9:48 PM Kei Ohta notifications@github.com wrote:

Now, you can get the latest codes through PyPI. Can you try following?

Update tf2rl

$ pip install -U tf2rl

Make sure the version is 0.1.14

$ pip list | grep tf2rl

Run your script

$ python ~/tf2rl-master/examples/run_gaifo_ddpg.py --env-name=HalfCheetah-v2 --expert-path-dir ~/GAIL/results/20200619T013740.036943SAC --gpu -1 --dir-suffix GAIfO

By the way, it seems that your path: ~/tf2rl-master suggests that you did not install tf2rl using git clone but you just download zip file, didn't you? Anyway above command can detect the version, so please let me know if you still encounter the same problem.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keiohta/tf2rl/issues/36#issuecomment-650563507, or unsubscribe https://github.com/notifications/unsubscribe-auth/APACPZTP44TSSKYFXIEKWF3RYX2EDANCNFSM4HSDDXZQ .

keiohta commented 4 years ago

I cannot see your screenshot. Can you copy the message or retry uploading the picture?

haoyu-x commented 4 years ago

sure.

21:56:03.468 [INFO] (irl_trainer.py:74) Total Epi: 7 Steps: 7000 Episode Steps: 1000 Return: -327.7823 FPS: 4416.74 21:56:03.713 [INFO] (irl_trainer.py:74) Total Epi: 8 Steps: 8000 Episode Steps: 1000 Return: -262.8208 FPS: 4088.41 21:56:03.955 [INFO] (irl_trainer.py:74) Total Epi: 9 Steps: 9000 Episode Steps: 1000 Return: -325.9061 FPS: 4149.77 21:56:04.268 [INFO] (irl_trainer.py:74) Total Epi: 10 Steps: 10000 Episode Steps: 1000 Return: -278.5830 FPS: 4176.82 Traceback (most recent call last): File "/home/haoyux/tf2rl-master/examples/run_gaifo_ddpg.py", line 43, in

trainer() File "/home/haoyux/venv/lib/python3.6/site-packages/tf2rl/experiments/irl_trainer.py", line 113, in __call__ expert_next_states=self._expert_next_obs[indices]) File "/home/haoyux/venv/lib/python3.6/site-packages/tf2rl/algos/gaifo.py", line 48, in train agent_states, agent_next_states, expert_states, expert_next_states) File "/home/haoyux/venv/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 580, in __call__ result = self._call(*args, **kwds) File "/home/haoyux/venv/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 627, in _call self._initialize(args, kwds, add_initializers_to=initializers) File "/home/haoyux/venv/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 506, in _initialize *args, **kwds)) File "/home/haoyux/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2446, in _get_concrete_function_internal_garbage_collected graph_function, _, _ = self._maybe_define_function(args, kwargs) File "/home/haoyux/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2777, in _maybe_define_function graph_function = self._create_graph_function(args, kwargs) File "/home/haoyux/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2667, in _create_graph_function capture_by_value=self._capture_by_value), File "/home/haoyux/venv/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py", line 981, in func_graph_from_py_func func_outputs = python_func(*func_args, **func_kwargs) File "/home/haoyux/venv/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 441, in wrapped_fn return weak_wrapped_fn().__wrapped__(*args, **kwds) File "/home/haoyux/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 3299, in bound_method_wrapper return wrapped_fn(*args, **kwargs) File "/home/haoyux/venv/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py", line 968, in wrapper raise e.ag_error_metadata.to_exception(e) ValueError: in user code: /home/haoyux/venv/lib/python3.6/site-packages/tf2rl/algos/gaifo.py:58 _train_body * real_logits = self.disc([expert_states, expert_next_states]) /home/haoyux/venv/lib/python3.6/site-packages/tf2rl/algos/gail.py:29 call * features = self.l1(features) /home/haoyux/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py:886 __call__ ** self.name) /home/haoyux/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/input_spec.py:216 assert_input_compatibility ' but received input with shape ' + str(shape)) ValueError: Input 0 of layer L1 is incompatible with the layer: expected axis -1 of input shape to have value 34 but received input with shape [32, 6] (venv) haoyux@haoyux-ThinkPad:~$ [image: Screenshot from 2020-06-27 21-54-05.png] On Sat, Jun 27, 2020 at 10:00 PM Kei Ohta wrote: > I cannot see your screenshot. > Can you copy the message or retry uploading the picture? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > , or > unsubscribe > > . >
keiohta commented 4 years ago

I guess you collected the expert transitions on different environment (such as Pendulum-v0? because the state dimension of pendulum-v0 is 3). Are you sure the expert data are collected on HalfCheetah-v2?

haoyu-x commented 4 years ago

OH! I made a stupid mistask. Thank you Kei, everything is fine now!

On Sat, Jun 27, 2020 at 10:24 PM Kei Ohta notifications@github.com wrote:

I guess you collected the expert transitions on different environment (such as Pendulum-v0? because the state dimension of pendulum-v0 is 3). Are you sure the expert data are collected on HalfCheetah-v2?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keiohta/tf2rl/issues/36#issuecomment-650567358, or unsubscribe https://github.com/notifications/unsubscribe-auth/APACPZRGHFMBPALRGXXZE6DRYX6IFANCNFSM4HSDDXZQ .

haoyu-x commented 4 years ago

one last question, how can I make a tensorboard figure like yours by command line? [image: Screenshot from 2020-06-27 22-28-13.png]

On Sat, Jun 27, 2020 at 10:26 PM Haoyu Xiong haoyux@berkeley.edu wrote:

OH! I made a stupid mistask. Thank you Kei, everything is fine now!

On Sat, Jun 27, 2020 at 10:24 PM Kei Ohta notifications@github.com wrote:

I guess you collected the expert transitions on different environment (such as Pendulum-v0? because the state dimension of pendulum-v0 is 3). Are you sure the expert data are collected on HalfCheetah-v2?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keiohta/tf2rl/issues/36#issuecomment-650567358, or unsubscribe https://github.com/notifications/unsubscribe-auth/APACPZRGHFMBPALRGXXZE6DRYX6IFANCNFSM4HSDDXZQ .

keiohta commented 4 years ago

It's great your script runs successfully! I cannot see your picture again... I just do:

$ tensorboard --logdir results

Does this answer your question?

haoyu-x commented 4 years ago

I mean how can I visualize the training process using tensorboard. The figure is https://github.com/keiohta/tf2rl/issues/67

On Sat, Jun 27, 2020 at 10:38 PM Kei Ohta notifications@github.com wrote:

It's great your script runs successfully! I cannot see your picture again... I just do:

$ tensorboard --logdir results

Does this answer your question?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keiohta/tf2rl/issues/36#issuecomment-650568998, or unsubscribe https://github.com/notifications/unsubscribe-auth/APACPZSWMA2M76SP3B2JWE3RYX74RANCNFSM4HSDDXZQ .

keiohta commented 4 years ago

You can add suffix to a resulted directory by adding --dir-suffix option. #67 uses it as:

$ python examples/run_gail_ddpg.py --env-name=HalfCheetah-v2 --expert-path-dir results/20191213T203858.508559_SAC_ --gpu -1 --dir-suffix GAIL
$ python examples/run_gaifo_ddpg.py --env-name=HalfCheetah-v2 --expert-path-dir results/20191213T203858.508559_SAC_ --gpu -1 --dir-suffix GAIfO
$ python examples/run_vail_ddpg.py --env-name=HalfCheetah-v2 --expert-path-dir results/20191213T203858.508559_SAC_ --gpu -1 --dir-suffix VAIL
haoyu-x commented 4 years ago

yes! thank you!

On Sat, Jun 27, 2020 at 10:47 PM Kei Ohta notifications@github.com wrote:

You can add suffix to a resulted directory by adding --dir-suffix option.

67 https://github.com/keiohta/tf2rl/issues/67 uses it as:

$ python examples/run_gail_ddpg.py --env-name=HalfCheetah-v2 --expert-path-dir results/20191213T203858.508559SAC --gpu -1 --dir-suffix GAIL $ python examples/run_gaifo_ddpg.py --env-name=HalfCheetah-v2 --expert-path-dir results/20191213T203858.508559SAC --gpu -1 --dir-suffix GAIfO $ python examples/run_vail_ddpg.py --env-name=HalfCheetah-v2 --expert-path-dir results/20191213T203858.508559SAC --gpu -1 --dir-suffix VAIL

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keiohta/tf2rl/issues/36#issuecomment-650570163, or unsubscribe https://github.com/notifications/unsubscribe-auth/APACPZXH2RFEPZ4ZPSAES23RYYBADANCNFSM4HSDDXZQ .

keiohta commented 4 years ago

My pleasure! Please don't hesitate to open an issue if you encounter any difficulty or question. I close this issue. Thanks for the report!

keiohta commented 4 years ago

OMG, this issue is not related to your question. So, I have to reopen this one. It would be better to open a new issue if it is not related to the original one ;)

haoyu-x commented 4 years ago

thank you again!

On Sat, Jun 27, 2020 at 11:01 PM Kei Ohta notifications@github.com wrote:

OMG, this issue is not related to your question. So, I have to reopen this one. It would be better to open a new issue if it is not related to the original one ;)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keiohta/tf2rl/issues/36#issuecomment-650571897, or unsubscribe https://github.com/notifications/unsubscribe-auth/APACPZWHX7KRFYN2KM4B2CDRYYCTNANCNFSM4HSDDXZQ .

haoyu-x commented 4 years ago

Hi Kei,

I'm using tf2rl'gaifo on robosuite. https://github.com/gal-leibovich/robosuite. but there is an error: mujoco_py.builder.MujocoException: Unknown warning type Time = 1.3900.Check for NaN in simulation. I found out that my policy-net generates action [nan nan nan nan nan nan nan nan] after several episodes training. It happens on robosuite all the time, but works well on gym. I'm wondering if you can offer me some help. Thank you!

On Sat, Jun 27, 2020 at 11:04 PM Haoyu Xiong haoyux@berkeley.edu wrote:

thank you again!

On Sat, Jun 27, 2020 at 11:01 PM Kei Ohta notifications@github.com wrote:

OMG, this issue is not related to your question. So, I have to reopen this one. It would be better to open a new issue if it is not related to the original one ;)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keiohta/tf2rl/issues/36#issuecomment-650571897, or unsubscribe https://github.com/notifications/unsubscribe-auth/APACPZWHX7KRFYN2KM4B2CDRYYCTNANCNFSM4HSDDXZQ .

ymd-h commented 4 years ago

Hi, @haoyu-x

Could you open a new issue?

This is the issue where developpers track and discuss AIRL implementation.

For me, your problem is not related with the main topic of this issue.

keiohta commented 4 years ago

Thanks @yamada-github-account , @haoyu-x and yes, I also think it would be better to open a new issue regarding this.

Aadit-Ambadkar commented 2 years ago

@keiohta I can't seem to find the run-airl-****.py files anywhere. Is this a commit issue? Am I missing something?

keiohta commented 2 years ago

Hi @Aadit-Ambadkar , we haven't fully tested AIRL yet, but you can try it on different branch: https://github.com/keiohta/tf2rl/blob/airl/examples/run_airl_sac.py