Restore overall base training performance of April 7 repo state (before refactoring)

andreaskoepf commented 2 years ago

Working commit: https://github.com/world-modelz/dreamax/commit/3e0ac35f7e44946a26430ca489e53ed415c84aa3

Pendulum after ~30min training at 100k env steps and >>500 average return.

andreaskoepf commented 2 years ago

Baseline run of old version: Tensorboard events file, charts (screenshot).

andreaskoepf commented 2 years ago

I reverted the main branch (--hard & force push) back to the Apr 7th working state. The refactoring commits have been moved to xmaster_refactor branch. I would suggest to treat xmaster_refactor as a temporary branch that is sealed and to add changes in a clean way (in multiple steps) back to the main branch.

General notes:

We should always make sure that (refactoring) changes are incremental improvements: The overall training functionality and performance must not be 'reduced' by refactoring work.
I personally strongly believe that removing most of the console outputs and progress reports during the refactoring was not a good idea. Maybe the tracing and progress reporting could be moved into a logger class (e.g. in order to configure it).
The original impl had not the color channel problem
The original impl had not the Nan/Inf in input warning problem

world-modelz / dreamax

Restore overall base training performance of April 7 repo state (before refactoring) #20