google / deluca

Performant, differentiable reinforcement learning
https://deluca.fyi
Apache License 2.0
121 stars 17 forks source link

Implementation of drc #47

Open FarnazAdib opened 3 years ago

FarnazAdib commented 3 years ago

Hi

Thanks for providing this interesting package.

I am trying to test drc on a simple setup and I notice that the current implementation of drc does not work. I mean when I try it for a simple partially observable linear system with A = np.array([[1.0 0.95], [0.0, -0.9]]), B = np.array([[0.0], [1.0]]) C = np.array([[1.0, 0]]) Q , R = I gaussian process noise, zero observation noise which is open loop stable, the controller acts like a zero controller. I tried to get a different response by setting the hyperparameters but they are mostly the same. Then I looked at the implementation at the deluca github and I noticed that the counterfactual cost is not defined correctly (if I am not wrong). According to Algorithm 1 in [1], we need to use M_t to compute y_t (which depends on the previous controls (u) using again Mt) but in the implementation, the previous controls based on M{t-i} are used. Anyway, I implemented the algorithm using M_t but what I get after the simulation is either close to zero control or an unstable one.

I was wondering if you have any code example for the DRC algorithm that works? [1] Simchowitz, Max and Singh, Karan and Hazan, Elad, "Improper learning for non-stochastic control", COLT 2020.

Thanks a lot, Sincerely, Farnaz

danielsuo commented 3 years ago

Farnaz:

Hello! Thanks for checking in. The implementation in deluca is a sample implementation and not in line with the paper you mentioned. I asked one of the authors and there is not a publicly available implementation.

Apologies!

Daniel

On Wed, Aug 25, 2021 at 5:40 AM FarnazAdib @.***> wrote:

Hi

Thanks for providing this interesting package.

I am trying to test drc on a simple setup and I notice that the current implementation of drc does not work. I mean when I try it for a simple partially observable linear system with A = np.array([[1.0 0.95], [0.0, -0.9]]), B = np.array([[0.0], [1.0]]) C = np.array([[1.0, 0]]) Q , R = I gaussian process noise, zero observation noise which is open loop stable, the controller acts like a zero controller. I tried to get a different response by setting the hyperparameters but they are mostly the same. Then I looked at the implementation at the deluca github and I noticed that the counterfactual cost is not defined correctly (if I am not wrong). According to Algorithm 1 in [1], we need to use M_t to compute y_t (which depends on the previous controls (u) using again Mt) but in the implementation, the previous controls based on M{t-i} are used. Anyway, I implemented the algorithm using M_t but what I get after the simulation is either close to zero control or an unstable one.

I was wondering if you have any code example for the DRC algorithm that works? [1] Simchowitz, Max and Singh, Karan and Hazan, Elad, "Improper learning for non-stochastic control", COLT 2020.

Thanks a lot, Sincerely, Farnaz

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/google/deluca/issues/47, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAURLVVEH77B7ELJZIFOV4LT6S3ABANCNFSM5CYVHC6A .

FarnazAdib commented 3 years ago

Hi Daniel,

Thank you very much for your response.

In your paper "Deluca -- A Differentiable Control Library: Environments, Methods, and Benchmarking", it is mentioned that DRC is an implementation of [1]. Anyway, you said it is a ``sample implementation'' of DRC, so you have probably tried your implementation on something. I was wondering if I can have that one?

Thank you very much for your time and help! Best regards, Farnaz

danielsuo commented 3 years ago

Hi:

As I mentioned, I reached out to the authors and they told me they don’t have the implementation. Perhaps you can try contacting them?

On Thu, Aug 26, 2021 at 2:45 AM Farnaz @.***> wrote:

Hi Daniel,

Thank you very much for your response.

In your paper "Deluca -- A Differentiable Control Library: Environments, Methods, and Benchmarking", it is mentioned that DRC is an implementation of [1]. Anyway, you said it is a ``sample implementation'' of DRC, so you have probably tried your implementation on something. I was wondering if I can have that one?

Thank you very much for your time and help! Best regards, Farnaz

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/google/deluca/issues/47#issuecomment-906139933, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAURLVRHWJEMVE6OBRXZX7DT6XPJLANCNFSM5CYVHC6A .

FarnazAdib commented 3 years ago

Ok. I don't follow up this point anymore.