Added logs + some fixes for the test_train_model + QoL

Signed-off-by: lobotuerk tomas.lobo.it@gmail.com

Max Reward log: The idea behind this, is that now if an agent is actually getting better, we should see the max rewards and value go higher overtime, even if its a 0 sum game. QoL was also added in the format of function arguments. You can now call main.py with -h for help, or with --starting_episode for telling the program which checkpoint to load (if 0, it does not load anything)

silverlight6 / TFTMuZeroAgent

Added logs + some fixes for the test_train_model + QoL #22