Closed luwo9 closed 3 months ago
The best way seems to be to really define the enviroment changes after, e.g., 100 and 200 full games and then create Samplers, Rewarders etc that are aware of this and also change their internal setting after this many steps. Same goes e.g. for the training memory such that it resets, say ,after the environments change. Those objects are the least general and thus should be adapted to the environment and picked at the top level, while the most general objects (q agents, q handlers should not need to be adapted to such a case)
But is this really the best way, to define e.g. a rewarder that changes behaviour from one base implementation to another (~coin rewarder -> win rewarder)?
Thinking more about this, maybe a Bundle in bombermans.py
should be redesigned as follows:
This would require small changes, such that an object is only responsible for saving/loading itself, not the objects it holds onto.
Such custom adjustments could, e.g., include:
This would also allow e.g. saving without the training set for a final agent version to e.g. be submitted, reducing the file size.
For this it must be informed what the current run of main.py
is intending, requiring a solution in sync with #7.
One could e.g. write a configuration file that configures what should be done (=loaded/saved/swapped out) in main.py
, and possibly even the different configurations intended for th corresponding main.py
run. Maybe a custom script could then automate the main.py runs aswell
.
One should think about consequences of usage of the bombermans.py
Bundles when e.g. wanting to continue to train a model but with a rewarder that was newly designed.
As of yet, it is unclear if training a well-crafted agent in a single configuration is sufficient. That is, it may be required to first train in, say a peaceful environment or against weaker agents and then in harder environments.
However, eventhough the code allows to train several rounds at once, as of my knowledge, it does not allow switching environments during this time. This means the
main.py
script needs to be run several times. This would have some consequences:main.py
calls)This could mean that an agent is not simply groupable as a collection of its parts (say, Regression model, rewarder, training set, sampler) but that they may change for an agent over its training process. In that case some thought needs to be put into how to save agents and package them up, and how to (still) allow for streamlined, automated training.
This realtes to #7 in that sense.