polixir / OfflineRL

A collection of offline reinforcement learning algorithms.
Apache License 2.0
159 stars 20 forks source link

Question about cql_loss calculation in COMBO #11

Open return-sleep opened 11 months ago

return-sleep commented 11 months ago

When COMBO is derived from CQL, why do they calculate CQL_loss differently?