OMAR loss in SMAC - Githubissues

zzq-bot commented 1 year ago

Hello, I recently came across your implementation of OMAR in SMAC, and I noticed that you've commented out the "zeroth-order optimization" part in the omar_learner.py file. I was wondering if you could provide some insight into the rationale behind this decision?

Around seven months ago, I attempted to replicate the results, but unfortunately, I wasn't able to achieve satisfactory outcomes. During my investigation, I observed that the variable "omar_sigma" tends to converge to zero after the optimization process. Could you please shed some light on why this might be happening?

cloud-qu commented 1 year ago

We think the zeroth-order optimizer is an inefficient method to approximate argmax Q essentially. While, in discrete actions space environment SMAC, we can directly obtain argmax Q. So, the zero omar_sigma phonomenon is reasonable as zeroth-order optimization tends to converge to the argmax Q.

zzq-bot commented 1 year ago

Thank you for your reply!

thu-rllab / CFCQL

OMAR loss in SMAC #1