thu-rllab / CFCQL

Code for NeurIPS2023 accepted paper: Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning.
25 stars 4 forks source link

OMAR loss in SMAC #1

Closed zzq-bot closed 1 year ago

zzq-bot commented 1 year ago

Hello, I recently came across your implementation of OMAR in SMAC, and I noticed that you've commented out the "zeroth-order optimization" part in the omar_learner.py file. I was wondering if you could provide some insight into the rationale behind this decision?

Around seven months ago, I attempted to replicate the results, but unfortunately, I wasn't able to achieve satisfactory outcomes. During my investigation, I observed that the variable "omar_sigma" tends to converge to zero after the optimization process. Could you please shed some light on why this might be happening?

cloud-qu commented 1 year ago

We think the zeroth-order optimizer is an inefficient method to approximate argmax Q essentially. While, in discrete actions space environment SMAC, we can directly obtain argmax Q. So, the zero omar_sigma phonomenon is reasonable as zeroth-order optimization tends to converge to the argmax Q.

zzq-bot commented 1 year ago

Thank you for your reply!