jviquerat / pbo

Policy-based optimization : single-step policy gradient seen as an evolution strategy
MIT License
16 stars 5 forks source link

Some questions about 'observation' #7

Closed jiatingdaole1109 closed 2 years ago

jiatingdaole1109 commented 2 years ago

Hello, it's really a nice job!

Currently, I tried to use this improved method to do some fluid optimization work. But while reading the code, I found that for every examples in /envs folder, the observation self.obs returned seems to be zeroes. I also checked this in the previous repository drl_shape_optimization, but found no clear definition in both papers and codes.

I know the observation put inito the network should be the same for every generation. But is it ok to set this variable simply to be zeroes? Or in other words, is it possible to feed the network some other values which relate to the environment so that it can do optimization in other conditions (e.g. the observation could be Re or inflow direction in drl_shape_optimization)

Looking forward to your reply!

jviquerat commented 2 years ago

Hi @jiatingdaole1109,

Thanks for showing interest in this work. Out of curiosity, what kind of problem exactly are you considering ?

As of today, the reason the method works is understood as follows: you have an optimization problem with n parameters, and instead of solving it directly, you "immerse" it in a larger problem with m parameters, m being the number of d.o.f. of the network, with most probably m > n or m >> n. The goal of the network is to embody the mapping from an initial constant state to the optimal parameters of your problem (in practice, the optimal parameters of the distribution outputted by the network). My guess is that the possible aliasing existing in the network mapping makes it easier to find the optimal parameters in the "network problem" than in the "real problem", but I insist on the fact that this is just a guess. Just to be clear, I also insist on the fact that this is an optimization method, and that it is not built to perform active control. The similarities with DRL mostly come from the fact that we follow a vanilla policy-gradient approach in the use and optimization of the networks.

So in short, if you regularly modify the input state, I'm not certain that things would still work. Of course, you're welcome to try and report on your results, I would be happy to be contradicted and discover more on the method, given that I mostly built it out of intuition.

jiatingdaole1109 commented 2 years ago

Thank you for your prompt reply!

Actually, my problem mainly focuses on drag reduction based on passive control, so it has no need to perform active control and just chooses some optimal parameters at the beginning.

It seems that I misunderstand the idea of single-step RL, as I thought it would work more like a blackbox model, which could build some relationship between the environment and desired optimal variable. Maybe it could achieve through some model-based RL method, but I'm also not sure of the feasibility.

Thanks for your explanation once again!

jviquerat commented 2 years ago

If you perform passive control, then yes you need a black-box optimizer to tune your control parameters right, and you're good. PBO can do that (we have performed several passive flow control cases with it, you can dig in my scholar page to find them), but in substance so can any optimization method, given that it is suited for your problem.

jiatingdaole1109 commented 2 years ago

Thanks for your suggestions! I will make a try.