Open baimengwei opened 4 years ago
The epsilon obtained is 1-1 / (log (n + 2)), and the update Q uses 1 / sqrt (n + 2)。 Seems like a good choice, is there any basis for doing so?
The epsilon obtained is 1-1 / (log (n + 2)), and the update Q uses 1 / sqrt (n + 2)。 Seems like a good choice, is there any basis for doing so?