rickstaa / stable-learning-control

A framework for training theoretically stable (and robust) Reinforcement Learning control algorithms.
https://rickstaa.dev/stable-learning-control
MIT License
6 stars 1 forks source link

Implement LPG #122

Closed rickstaa closed 2 years ago

rickstaa commented 3 years ago

User story

As discussed in the meeting, we want to implement the LPG agent. @panweihit @dds0117 in this report, I will track the progress of this new algorithm.

Steps

rickstaa commented 3 years ago

Test training Performance (CartPoleCost)

Let's first test the training performance of the following LAC versions in the CartPoleCost environment:

Let's also quickly investigate the following SAC versions:

Regular SAC and LAC performance

LAC

Experiment file: experiments/gpl_2021/lac_cart_pole_cost.yml.

As we already know, LAC works.

Open the report [lac_cart_pole_cost_s0.zip](https://github.com/rickstaa/bayesian-learning-control/files/6277292/lac_cart_pole_cost_s0.zip) ![image](https://user-images.githubusercontent.com/17570430/113995596-ad7a1280-9856-11eb-911f-b8bf0e809996.png) ![Lac_performance](https://user-images.githubusercontent.com/17570430/113995835-e5815580-9856-11eb-94fe-d12819548f7e.png) ![lac_performance](https://user-images.githubusercontent.com/17570430/113996002-0d70b900-9857-11eb-8c86-31b4c6762a06.gif)

SAC

Experiment file: experiments/gpl_2021/sac_cart_pole_cost.yml.

As we already know, SAC can also perform on the CartPoleCost environment.

Open the report [sac_cart_pole_cost_s1250.zip](https://github.com/rickstaa/bayesian-learning-control/files/6277623/sac_cart_pole_cost_s1250.zip) ![image](https://user-images.githubusercontent.com/17570430/114003415-026d5700-985e-11eb-8b37-071fbc812b5f.png) ![sac_performance](https://user-images.githubusercontent.com/17570430/114003795-58da9580-985e-11eb-8421-07db98d89f1a.png) ![sac_performance](https://user-images.githubusercontent.com/17570430/114003659-38aad680-985e-11eb-895c-f13599d201ca.gif)

SAC 2

Experiment file: experiments/gpl_2021/sac2_cart_pole_cost.yml.

Seems to work fine.

Open the report [sac2_cart_pole_cost_s1250.zip](https://github.com/rickstaa/bayesian-learning-control/files/6277728/sac2_cart_pole_cost_s1250.zip) ![image](https://user-images.githubusercontent.com/17570430/114006409-b40d8780-9860-11eb-8e04-46f2619bd6bb.png) ![sac2](https://user-images.githubusercontent.com/17570430/114006598-ddc6ae80-9860-11eb-9741-21ead573e03a.png) ![sac2](https://user-images.githubusercontent.com/17570430/114006765-00f15e00-9861-11eb-94f9-0c8e9d32ce86.gif)

LAC 2

Experiment file: experiments/gpl_2021/lac2_cart_pole_cost.yml.

Also works.

Open the report [lac2_cart_pole_cost_s1250.zip](https://github.com/rickstaa/bayesian-learning-control/files/6277837/lac2_cart_pole_cost_s1250.zip) ![image](https://user-images.githubusercontent.com/17570430/114009720-ae657100-9863-11eb-9754-480ff2cfc40f.png) ![lac2](https://user-images.githubusercontent.com/17570430/114009930-e5d41d80-9863-11eb-8492-a8aad62aca35.png) ![lac2](https://user-images.githubusercontent.com/17570430/114010036-ff756500-9863-11eb-8fd3-ba52ae750e6d.gif)

LAC3

Experiment file: experiments/gpl_2021/lac3_cart_pole_cost.yml.

Also works.

Open the report [lac3_cart_pole_cost_s1250.zip](https://github.com/rickstaa/bayesian-learning-control/files/6278450/lac3_cart_pole_cost_s1250.zip) ![image](https://user-images.githubusercontent.com/17570430/114018851-fa1d1800-986d-11eb-841a-f01e659c1c41.png) ![lac3](https://user-images.githubusercontent.com/17570430/114018941-1b7e0400-986e-11eb-802f-0f7f60e5fc60.png) ![lac3](https://user-images.githubusercontent.com/17570430/114023149-cc869d80-9872-11eb-9984-64aad8a14157.gif)

LAC4

Experiment file: experiments/gpl_2021/lac4_cart_pole_cost.yml.

Also works but after this first test, it looks like performance is worse. This could also be due to random factors.

Open the report [lac4_cart_pole_cost_s1250.zip](https://github.com/rickstaa/bayesian-learning-control/files/6278564/lac4_cart_pole_cost_s1250.zip) ![image](https://user-images.githubusercontent.com/17570430/114025772-c9d97780-9875-11eb-86e6-3d5ba396df3a.png) ![lac4_performance](https://user-images.githubusercontent.com/17570430/114025901-f097ae00-9875-11eb-83fa-d2ee31660121.png) ![lac4_performance](https://user-images.githubusercontent.com/17570430/114026095-2472d380-9876-11eb-9539-b1e2cc4028e7.gif)

LAC5

Experiment file: experiments/gpl_2021/lac5_cart_pole_cost.yml.

Works as expected.

Open the report [lac5_cart_pole_cost_s1250.zip](https://github.com/rickstaa/bayesian-learning-control/files/6279545/lac5_cart_pole_cost_s1250.zip) ![image](https://user-images.githubusercontent.com/17570430/114048680-14b1ba00-988b-11eb-913d-04d1ddc47416.png) ![lac5_performance](https://user-images.githubusercontent.com/17570430/114048894-3dd24a80-988b-11eb-8fb8-fbc631282925.png) ![lac5_performance](https://user-images.githubusercontent.com/17570430/114055421-0e264100-9891-11eb-8541-cc9d2c5c665e.gif)

LAC6

Experiment file: experiments/gpl_2021/lac6_cart_pole_cost.yml.

Works as expected.

Open the report [lac6_cart_pole_cost_s1250.zip](https://github.com/rickstaa/bayesian-learning-control/files/6280253/lac6_cart_pole_cost_s1250.zip) ![image](https://user-images.githubusercontent.com/17570430/114066890-7dedf900-989c-11eb-91ab-78fb1f0af62e.png) ![Uploading lac6_performance.gif…]() ![lac6_performance](https://user-images.githubusercontent.com/17570430/114067138-c4435800-989c-11eb-881f-71b298f375a2.gif)

Conclusion

All algorithms are able to train. For simplicity let's first work with LAC4 as we can make the other changes later. For this algorithm, we should look at the robustness against disturbances with the original LAC algorithm

rickstaa commented 3 years ago

Disturbance robustness evaluation (CartPoleCost)

LAC original results

Seems to work fine

Open the report ## Seed 0 [lac_cart_pole_cost_s0.zip](https://github.com/rickstaa/bayesian-learning-control/files/6312399/lac_cart_pole_cost_s0.zip) ### Performance ![Lac_performance](https://user-images.githubusercontent.com/17570430/114747380-10851100-9d51-11eb-8948-a2b1b17b672b.png) ![lac_performance](https://user-images.githubusercontent.com/17570430/114748945-b38a5a80-9d52-11eb-9e3a-368fc7253ba2.gif) ### Robustness eval ![image](https://user-images.githubusercontent.com/17570430/114751036-f3524180-9d54-11eb-8417-a8a264a9be6f.png) ![image](https://user-images.githubusercontent.com/17570430/114751171-1846b480-9d55-11eb-8ef7-eb6ad2b4e6cd.png) ![image](https://user-images.githubusercontent.com/17570430/114751187-1e3c9580-9d55-11eb-8c45-76e6f22653a3.png) ![image](https://user-images.githubusercontent.com/17570430/114751204-23014980-9d55-11eb-9360-6d53eb21f6b0.png) ![image](https://user-images.githubusercontent.com/17570430/114751218-27c5fd80-9d55-11eb-8d2c-fade7591f9f4.png) ![image](https://user-images.githubusercontent.com/17570430/114751351-58a63280-9d55-11eb-9a11-075198183cf5.png) ![image](https://user-images.githubusercontent.com/17570430/114751247-33192900-9d55-11eb-84fa-10f2ee3aa60e.png) ![lac_soi](https://user-images.githubusercontent.com/17570430/114751874-fc8fde00-9d55-11eb-9c39-69b861d54b64.png) ## Seed 1250

LAC4 results

Seems to give the same results as the original lac.

Open the report ## Seed 0 [lac4_cart_pole_cost_s0.zip](https://github.com/rickstaa/bayesian-learning-control/files/6312714/lac4_cart_pole_cost_s0.zip) ### Performance ![lac4_performance_0](https://user-images.githubusercontent.com/17570430/114755379-da985a80-9d59-11eb-979c-0e831d61ce32.png) ![lac4_performance_0](https://user-images.githubusercontent.com/17570430/114755372-d9672d80-9d59-11eb-9209-7ef87e6c19f3.gif) ### Robustness eval ![image](https://user-images.githubusercontent.com/17570430/114755716-437fd280-9d5a-11eb-90b1-3c3b5c8c9bc4.png) ![image](https://user-images.githubusercontent.com/17570430/114755742-4975b380-9d5a-11eb-83fb-d4be0d826f65.png) ![image](https://user-images.githubusercontent.com/17570430/114755758-4da1d100-9d5a-11eb-966c-dff33b9a1f37.png) ![image](https://user-images.githubusercontent.com/17570430/114755895-71fdad80-9d5a-11eb-9622-4c8f3c972b69.png) ![image](https://user-images.githubusercontent.com/17570430/114755923-77f38e80-9d5a-11eb-8ce1-46f68756bbc7.png) ![image](https://user-images.githubusercontent.com/17570430/114755942-7de96f80-9d5a-11eb-9e56-d255d1b5fdf6.png) ![image](https://user-images.githubusercontent.com/17570430/114755776-52ff1b80-9d5a-11eb-991c-baeecff21f0b.png) ![lac4_soi_0](https://user-images.githubusercontent.com/17570430/114756189-c739bf00-9d5a-11eb-89cb-96eaceba50fe.png) ## Seed 1250

SAC original results

Like in Han et al. 2020 the robustness is lower than the LAC algorithm. Related to that the algorithm also has a higher deadrate.

Open the report ## Seed 0 ### Performance [sac_cart_pole_cost_s0.zip](https://github.com/rickstaa/bayesian-learning-control/files/6313040/sac_cart_pole_cost_s0.zip) ![sac_performance_0](https://user-images.githubusercontent.com/17570430/114762931-8ba2f300-9d62-11eb-871b-ff86310e03fb.png) ![sac_performance_0](https://user-images.githubusercontent.com/17570430/114762933-8cd42000-9d62-11eb-8b85-f91369e51a95.gif) ### Robustness eval ![image](https://user-images.githubusercontent.com/17570430/114763070-b3925680-9d62-11eb-8f20-d69d3cdc017e.png) ![image](https://user-images.githubusercontent.com/17570430/114763089-b8570a80-9d62-11eb-9611-33ac07beea94.png) ![image](https://user-images.githubusercontent.com/17570430/114763107-bd1bbe80-9d62-11eb-80b2-06d21b9ca2b8.png) ![image](https://user-images.githubusercontent.com/17570430/114763132-c1e07280-9d62-11eb-9c5f-751aa81e3a6c.png) ![image](https://user-images.githubusercontent.com/17570430/114763150-c60c9000-9d62-11eb-9c79-b2045d4f58fe.png) ![image](https://user-images.githubusercontent.com/17570430/114763165-cc027100-9d62-11eb-97ba-6cafcd5a59e9.png) ![soi_sac](https://user-images.githubusercontent.com/17570430/114763220-dae92380-9d62-11eb-925b-7da8f9a16d77.png)
rickstaa commented 3 years ago

Disturbance robustness evaluation (Oscillator)

LAC original results

Open the report ## Seed 0 ## Seed 1250 ### Performance ![lac_perofrmance](https://user-images.githubusercontent.com/17570430/115113734-0b5fd600-9f8c-11eb-91de-a58275471d4b.png) ![image](https://user-images.githubusercontent.com/17570430/115113764-2df1ef00-9f8c-11eb-97ee-7ae6dc4e9c97.png) ### Robustness eval #### Look at K ![image](https://user-images.githubusercontent.com/17570430/115113917-02bbcf80-9f8d-11eb-8494-ca432a032582.png) ![image](https://user-images.githubusercontent.com/17570430/115113922-06e7ed00-9f8d-11eb-8406-aa9394d959fb.png) ![image](https://user-images.githubusercontent.com/17570430/115113929-09e2dd80-9f8d-11eb-88af-f792b20751b0.png) ![lac_soi2](https://user-images.githubusercontent.com/17570430/115113914-f899d100-9f8c-11eb-8781-4d189c520326.png) #### Look at a1 (c1) ![image](https://user-images.githubusercontent.com/17570430/115113792-57127f80-9f8c-11eb-86d2-e7614f8b4056.png) ![image](https://user-images.githubusercontent.com/17570430/115113807-62fe4180-9f8c-11eb-96b3-4cef9688af20.png) ![image](https://user-images.githubusercontent.com/17570430/115113825-77423e80-9f8c-11eb-904d-d689b103b044.png) ![lac_soi](https://user-images.githubusercontent.com/17570430/115113852-95a83a00-9f8c-11eb-874f-c75fe8bff026.png)

LAC4 results

Seems to give the same results as the original lac.

Open the report ## Seed 0 ## Seed 1250 ### Performance ![osc_lac4_performance](https://user-images.githubusercontent.com/17570430/115112964-29c3d280-9f88-11eb-8e42-2cb48b681dcb.png) ![image](https://user-images.githubusercontent.com/17570430/115112958-26304b80-9f88-11eb-910a-e6f40f435b18.png) ### Robustness eval #### Look at K ![image](https://user-images.githubusercontent.com/17570430/115113289-e0748280-9f89-11eb-9f39-c5b11cc1f411.png) ![image](https://user-images.githubusercontent.com/17570430/115113303-f124f880-9f89-11eb-82cc-3236167bb407.png) ![image](https://user-images.githubusercontent.com/17570430/115113311-03069b80-9f8a-11eb-829a-ce4d1709cbfd.png) ![lac4_soi](https://user-images.githubusercontent.com/17570430/115113476-e61e9800-9f8a-11eb-905c-38c2af80d3c0.png) #### Look at a1 (c1) ![image](https://user-images.githubusercontent.com/17570430/115113518-1fef9e80-9f8b-11eb-821d-766fe6c3dae3.png) ![image](https://user-images.githubusercontent.com/17570430/115113536-2a119d00-9f8b-11eb-8b5d-64a389c2ee87.png) ![image](https://user-images.githubusercontent.com/17570430/115113556-40b7f400-9f8b-11eb-8ce4-29e417611715.png) ![lac4_soi2](https://user-images.githubusercontent.com/17570430/115113571-50373d00-9f8b-11eb-9fcb-27b3b37b636a.png)

SAC original results

Open the report ## Seed 1250 ### Performance ![sac_performance](https://user-images.githubusercontent.com/17570430/115113889-d1db9a80-9f8c-11eb-861c-a8809b63c833.png) ![image](https://user-images.githubusercontent.com/17570430/115113895-ddc75c80-9f8c-11eb-94a3-f188d9e087e2.png) ### Robustness eval #### Look at K ![image](https://user-images.githubusercontent.com/17570430/115114013-60e8b280-9f8d-11eb-820f-1bb949162b13.png) ![image](https://user-images.githubusercontent.com/17570430/115114020-6940ed80-9f8d-11eb-8139-2d7342f9c90b.png) ![image](https://user-images.githubusercontent.com/17570430/115114025-6c3bde00-9f8d-11eb-8092-25305709df65.png) ![sac_soi](https://user-images.githubusercontent.com/17570430/115114045-82499e80-9f8d-11eb-8e72-51c35430f34f.png) #### Look at a1 (c1) ![image](https://user-images.githubusercontent.com/17570430/115114096-d05ea200-9f8d-11eb-95cf-a8f5f7bde415.png) ![image](https://user-images.githubusercontent.com/17570430/115114100-d5bbec80-9f8d-11eb-8953-09634c14a770.png) ![image](https://user-images.githubusercontent.com/17570430/115114105-d9e80a00-9f8d-11eb-9b34-2a52f8479e3a.png) ![sac_soi2](https://user-images.githubusercontent.com/17570430/115114120-ebc9ad00-9f8d-11eb-8dbb-3e05a1feb749.png)
rickstaa commented 3 years ago

Meeting notes 17-04-2021

rickstaa commented 3 years ago

Evaluate LAC robustness

@panweihit Let's evaluate the new lac4 and compare it with SAC for multiple environments but now let it train for 1e6 steps:

Oscillator-v1

LAC

Good performance looks better than SAC but worse than LAC4.

Open Report #### Seed 1250 ##### Performance ![lac_oscillator_long_performance](https://user-images.githubusercontent.com/17570430/115148073-4599a800-a05e-11eb-955d-65d1cdb80d9f.png) ![image](https://user-images.githubusercontent.com/17570430/115148104-64983a00-a05e-11eb-9019-d0b6027f6806.png) ##### Robustness ###### Change K ![image](https://user-images.githubusercontent.com/17570430/115148272-1c2d4c00-a05f-11eb-8632-871b8cf62407.png) ![image](https://user-images.githubusercontent.com/17570430/115148275-20f20000-a05f-11eb-9286-d274048f7ea3.png) ![image](https://user-images.githubusercontent.com/17570430/115148283-2d765880-a05f-11eb-92d9-c5ccfbc1567f.png) ![lac_oscillator_long_soi_k](https://user-images.githubusercontent.com/17570430/115148301-3b2bde00-a05f-11eb-947f-3d691fb78b0e.png) ###### Change c1 ![image](https://user-images.githubusercontent.com/17570430/115148181-bd67d280-a05e-11eb-8f5b-29426e342098.png) ![image](https://user-images.githubusercontent.com/17570430/115148187-c193f000-a05e-11eb-8dc4-2a87cfbca17e.png) ![image](https://user-images.githubusercontent.com/17570430/115148194-c5277700-a05e-11eb-8ac3-2b069a1a2d2b.png) ![lac_oscillator_long_soi](https://user-images.githubusercontent.com/17570430/115148197-c8226780-a05e-11eb-8237-09e9699afee1.png)

LAC 4

Performance and robustness look better than LAC (could still be seeding). It also looks better than SAC.

Open Report #### Seed 0 ##### Performance ![lac4_long_performance](https://user-images.githubusercontent.com/17570430/115148415-bdb49d80-a05f-11eb-9342-42566050ea40.png) ![image](https://user-images.githubusercontent.com/17570430/115148448-e89ef180-a05f-11eb-95a8-254f777c93d2.png) ##### Robustness ###### Change K ![image](https://user-images.githubusercontent.com/17570430/115148467-08361a00-a060-11eb-912d-63645d6f5655.png) ![image](https://user-images.githubusercontent.com/17570430/115148479-1421dc00-a060-11eb-95db-6b66c0cac36a.png) ![image](https://user-images.githubusercontent.com/17570430/115148491-23088e80-a060-11eb-83fb-c93c2fb374be.png) ![lac4_long_performance_soi_k](https://user-images.githubusercontent.com/17570430/115148577-80044480-a060-11eb-8be3-4727cd3ec83b.png) ###### Change c1 ## Conclusion ![image](https://user-images.githubusercontent.com/17570430/115148749-54ce2500-a061-11eb-9d74-c26ef7d64ed7.png) ![image](https://user-images.githubusercontent.com/17570430/115148753-5861ac00-a061-11eb-8a77-cfc8431c14cb.png) ![image](https://user-images.githubusercontent.com/17570430/115148759-5bf53300-a061-11eb-9b34-0688121d28fc.png) ![lac4_long_soi_c](https://user-images.githubusercontent.com/17570430/115148745-50a20780-a061-11eb-8462-00d4f054cca2.png)

SAC

Performance and robustness look worse than both LAC versions.

Open Report #### Seed 1250 ##### Performance ![sac_performance](https://user-images.githubusercontent.com/17570430/115148919-e9d11e00-a061-11eb-9a8e-04981c1c0078.png) ![image](https://user-images.githubusercontent.com/17570430/115148951-ff464800-a061-11eb-8c33-af153b432125.png) ##### Robustness ###### Change K ![image](https://user-images.githubusercontent.com/17570430/115149192-47199f00-a063-11eb-9947-65f319c2e36b.png) ![image](https://user-images.githubusercontent.com/17570430/115149194-4b45bc80-a063-11eb-9273-8f5fa59c037b.png) ![image](https://user-images.githubusercontent.com/17570430/115149204-5567bb00-a063-11eb-9ad8-9ebe432c3bee.png) ![sac_sio_K](https://user-images.githubusercontent.com/17570430/115149187-4123be00-a063-11eb-9c92-aa1a0b76b902.png) ###### Change c1 ![image](https://user-images.githubusercontent.com/17570430/115148969-21d86100-a062-11eb-9283-8016951543fc.png) ![image](https://user-images.githubusercontent.com/17570430/115148972-29980580-a062-11eb-9166-e70135383b2e.png) ![image](https://user-images.githubusercontent.com/17570430/115148978-36b4f480-a062-11eb-9d93-6df188439130.png) ![sac_soi_c1](https://user-images.githubusercontent.com/17570430/115148995-47fe0100-a062-11eb-9108-d9e5c7c488fd.png)

Conclusion

rickstaa commented 3 years ago

Meeting notes (18-04-2021)

@dds0117 I had a meeting with @panweihit yesterday to discuss the results of the tests above and discuss the continuation of our research. Below you will find the notes to the meeting.

Results discussion

Other discussion points

@panweihit pointed me to a very insightful course of MIT given by dr. Russ Tedrake. This course explains as long as your reward is Lyapunov stable (has a decreasing derivative), the system also learns the stable and robust behaviour. I haven't watched the full lecture yet, so that I will update the explanation below later. But here is my current understanding:

This conclusion implies that we don't need to design very complicated stability measures for our robot tasks. A reward that makes sure that the Robot doesn't fall is good enough to ensure stability and robustness. Let's take Boston dynamics spot dog as an example. In this case, we don't need to use a cost function that exploits complicated theoretical stability measures like the Zero-Moment point or the COM being vertically inside the convex hull of its contact points to achieve stable behaviour. According to dr. Russ Tedrake, using such knowledge is merely a bonus. Using a simpler cost function like the perpendicular distance between the robot COM already implicitly encodes the stability. If the robot cannot track this path, it died, so it is learning stable behaviour when our Lyapunov values are always decreasing. This greatly increases how practical our algorithm is since we can now use our algorithm for learning stable/robust behaviour even when theoretical knowledge about the system's stability is not available. For systems where we have such knowledge, we can use it to get an additional bonus.

What do we need to do now

Currently, I'm finishing several experiments to:

I am further adding a value network to the LAC algorithm so that we can replace it with a gaussian process. Replacing it with a gaussian process makes sense since this allows some stochasticity in the value function, making it easier for the agent to train stable behaviour. The discussion is similar in nature to why SAC uses a gaussian actor instead of a deterministic one. Here we now use a stochastic value function instead of a deterministic one. We use a Gaussian process instead of a gaussian network since the value function is convex in nature. I and @panweihit agreed that a Gaussian process would be well able to catch the behaviour while keeping the algorithm simple because of this nature. Your gaussian process will replace the value network of the new LAC algorithm (I will create this algorithm based on the second version of SAC).

new_structure

The next steps for creating the GPL algorithm, therefore, are as follows:

rickstaa commented 3 years ago

LAC4 Improvements

Take min lypunov target value

@panweihit slightly modified the Lyapunov constraint such that now the minimum Lyapunov value is used in the Lyapunov constrained:

l_delta = torch.mean(lya_l_.min() - l1.detach() + self._alpha3 * r)  # See Han eq. 11

Remove stricter Lyapunov stability

We removed the alpha_3*r term from the Lyapunov constraint.

self._alpha3 = 0.000001 # Small quadratic regulator term to ensure negative definiteness. Without it the derivative can be negative semi definite.
l_delta = torch.mean(lya_l_.min() - l1.detach() + self._alpha3)  # See Han eq. 11

The LAC algorithm trains fine without this.

dds0117 commented 3 years ago

Yes,I agree with you. The Gaussian Process value function is finished,but I meet a problem in which we could use GP value function instead of value function directly. Because gaussian process is related with the temporal sequence during training,it would be used Monte-Carlo update instead of Temporal different(TD)update. I am in trouble with it and we can talk about it tomorrow.

rickstaa commented 3 years ago

@dds0117 Good point. I wasn't aware that it was a Monte-Carlo method.

rickstaa commented 3 years ago

@panweihit, @dds0117 Here is the new model that was trained for the robustness eval of the cart_pole.

lac4_cart_pole_cost_s1250.zip

Robustness eval Instructions

See also https://rickstaa.github.io/bayesian-learning-control/control/eval_robustness.html.

  1. Create conda environment.
  2. Activate conda environment.
  3. Install packages pip install -e .
  4. Put model inside data folder.
  5. Run the following command:
python -m bayesian_learning_control.run eval_robustness ~/Development/work/bayesian-learning-control/data/lac4_cart_pole_cost/lac4_cart_pole_cost_s1250 --disturbance_type=input
  1. See the results

Change the disturbance

To change the disturbance changes the Magnitude inside the DISTURBER_CFG variable in the https://github.com/rickstaa/simzoo/blob/c0f32230f68b7f0353412a848d8b8598cd82d21c/simzoo/common/disturber.py#L61 file.

rickstaa commented 3 years ago

Disussion 11-06-2021

@panweihit, @dds0117 For future reference here a small summary of what we found out in our experimentation yesterday:

Like we discussed I think the main takeaway is that when we implement the gaussian version of the LAC algorithm it should be able to work when the function approximator, (deep) Gaussian process, is big enough to catch the complexity of the system.

rickstaa commented 2 years ago

Closed since there are more important things to do first.