Closed zzq-bot closed 1 year ago
We think the zeroth-order optimizer is an inefficient method to approximate argmax Q essentially. While, in discrete actions space environment SMAC, we can directly obtain argmax Q. So, the zero omar_sigma phonomenon is reasonable as zeroth-order optimization tends to converge to the argmax Q.
Thank you for your reply!
Hello, I recently came across your implementation of OMAR in SMAC, and I noticed that you've commented out the "zeroth-order optimization" part in the omar_learner.py file. I was wondering if you could provide some insight into the rationale behind this decision?
Around seven months ago, I attempted to replicate the results, but unfortunately, I wasn't able to achieve satisfactory outcomes. During my investigation, I observed that the variable "omar_sigma" tends to converge to zero after the optimization process. Could you please shed some light on why this might be happening?