MCTS Reinforcement learning

nebw / argos-zero

A MCTS computer go program using convolutional neural networks

4 stars 0 forks source link

MCTS Reinforcement learning #1

Open nebw opened 6 years ago

fraboeni commented 6 years ago

Goals: Day 3 (28.02.2018)

Data Format: We want to specify a data format standard that allows us to work on the tasks simultaneously. The format should specify how data needs to be collected during the self-play phase, how it can be transferred between machines (parallelization) such that it can be used in the training.

Subtask Assignment: Subtasks need to be well-defined, group members should be assigned to the subtasks and each for each of the subtasks.

Sub-goal definition: For each subtask goals need to be defined and the group as a whole needs to define a time plan.

Insight into topic: Each sub-group works on getting a deeper insight into their task and at the end of the day they present their findings to the rest of the group.

fraboeni commented 6 years ago

Results: Day 3 (28.02.2018)

Data Format: We agreed on using Capnproto as an exchange format and specified the fields of a schema.

Subtask Assignment and Goal Definition: We currently identified 3 subtasks and defined the following goals:

Adapt the selfplay according to the paper. The selfplay should run as described for Alpha Go Zero, especially adding noise. (Julian, Valentin)
Implementing the Capnp-File format; implementing the statistic extraction from the selfplay; writing a TCP socket that can send the messages generated in the selfplay. The solution should be able to run tomorrow. (Christoph, Franziska)
Implementing a daemon node that receives all the TCP messages from the machines that run the selfplay. Extract the messages and read their content. Write the content to a HD5 file afterwards. (Florian, Dorian)

Insight and Progress:

We read the paper thoroughly. We identified the problem of adding noise. There is no implementation derichlet noise C++, therefore, the task is going to take more effort than expected. The implementation of the exploration within the first 30 moves was implemented partially and is going to be terminated tomorrow.
Capnp schema has been developed and can be found in https://github.com/nebw/argos-zero/tree/capnp/src/capnp. We started implementing the TCP Socket part and the data collection during selfplay.
Daemon is running. It was implemented in python programming language. The only part missing so far is the capnp message decoding.

fraboeni commented 6 years ago

Goals Day 4: (1.3.18)

Capnp Messages from Selfplay: The statistics from the selfplay should wrapped to Capnp messages for transmission in the network.

Network: Finish the work on daemon and client side and make them work together.

Selfplay Adaptation: Finish the adaptation of the selfplay accordingly to the paper. Add the noise and the temperature.

HD5: Understand the structure of HD5 files and get the daemon to write the output of the capnp messages to HD5 files.

fraboeni commented 6 years ago

Results: Day 4 (1.3.18)

Capnp Message from Selfplay: The schema had to be adapted several times. First, the structure of the board states (19x19x8) Array was not suited to be filled directly in a Capnp List of List of List. Instead it needed to be flattend. The game result, a timestamp and a unique ID were generated. This took longer than expected due to difficulies in c++. The packing in messages was fully completed. The next goal will be to separate the classes of selfplay and capnp. Change history can be found in the capnp branch.

Network: The goal of connecting the daemon and the client side was not reached. The daemon is running. On client side, so far, the raw structure of sockets in c++ was understood, and partially implemented. Everything took a lot longer than expected because the build was corrupted several times for nearly every group member. Reinstalling a working state of the project took about half an hour every time.

Selfplay Adaption: The noise implementation and temperature adaptation were implemented and work now. They can be found in the paper branch: https://github.com/nebw/argos-zero/tree/paper. The paper further describes a dynamic probability for passing. This problem is going to be approached tomorrow.

HD5: The data storage has been finished. When testing data is available, the code can be tested and, if necessary, adapted.

florian commented 6 years ago

Goals Day 5: (2.3.18)

Capnp Code Refactoring: Currently the code for the Capnp serialization is inside the selfplay class. The new code should be refactored and the logic for selfplay / serialization should be separated.

Selfplay: Integrate all the different parts for the selfplay code and make sure they work well together. The code for the daemon needs to be adapted for the new Capnp schema. The socket part for the C++ selfplay part still needs to be written. In the end we would like to put all the pieces together to start training.

Neural network training: Figure out how exactly the training works and what format the training data needs to have. This information can later be used to adapt the data storage part in the daemon.

Generating training data: We should figure out where exactly to run the data generation code on: Personal machines or the cluster? If it's the cluster, how do we make that work?

fraboeni commented 6 years ago

Results: Day 5 (2.3.18)

Code Refactoring: The code in the capnp branch has been refactored. The capnp part and the network part have been moved to a separate collector class. A test was run to check if the capnp messages contain the correct content.

Capnp Schema Adaption: The capnp schema needed to be adapted again to fulfil the requirements of the code. The network IDs are now saved as text to be able to deal with UU IDs.

TCP Networking: The TCP part on the c++ side now works and is able to send the capnp messages. The code has been copied into the capnp branch to the collector class. A lining error with the capnp library occurred and we spent more than an hour debugging it without result.

Neural network training: The training data was downloaded. SqashFuse was installed to read the files. The python training file was read and understood and a new branch was created: https://github.com/nebw/argos-zero/tree/adaptingNN.

Selfplay into master: The adaptation of the selfplay was merged into the master branch.

Generating training data: During the weekend we are going to run the selfplay code on a single machine and store the resulting capnp messages on disk.

florian commented 6 years ago

Goals: Day 6 (5.3.18)

Neural network training: Early stopping should be added to the training of the neural network. We also want to apply several data augmentation techniques, e.g. the Go board can be flipped and rotated without really changing the game. However, this way we can generate more training data.

TCP network connection: We need to make sure that the socket part of the project works fine. Maybe the code needs to be adapted a little bit to make sure that it integrates well with the Python daemon.

C++ build process: On Friday there were some problems with the build process on Mac. Ben made some changes to the build process, so we can try to see if it works better now.

Merge changes into master: Ideally, we would like to merge all our changes into master.

florian commented 6 years ago

Results: Day 6 (5.3.18)

Neural network training: Early stopping and the data augmentations were both implemented successfully.

TCP network connection: There was some progress on making the TCP connection work, both on the Python and C++ side. However, we weren't able to solve all of the problems.

C++ build process: There are still some problems unrelated to the build process, but the problems from Friday are all solved now. Reinstalling capnp from source and configuring some other things were the solution.

Merge changes into master: We successfully merged our changes to master.

fraboeni commented 6 years ago

Goals: Day 7 (6.3.18)

Test Data Augmentation: We want to test the data augmentation that was implemented yesterday and resolve possible errors.

Dynamic Thresholding: We want to implement a dynamic thresholding.

Import training data from HD5: The import of the data from HD5 files needs to be implemented.

Adapt Python Daemon: We need to continue with the adaptations of the python daemon to read the capnp messages from stream. Yesterday, we solved two important issues, the first one was that the messages did not get sent entirely. We resolved this by not using the packed version of write in capnp anymore. The second problem was that the daemon could not receive entire messages. Therefore, we implemented a new receive method. Today last errors should be resolved to run the selfplay on a cluster.

chbrock commented 6 years ago

Results Day 7 (6.3.18)

python deamon adaptation: The deamon runs as a script and is able to write game states to HD5 files. We added functionality to accept multiple clients, to write several games into one file and to use several files if a certain number of states is exceeded.

Test data augmentation: augmentation now works properly

Import training Data from HD5: a parser that is able to read from the files was written

Network Evaluation: functionality from the evaluation group can be used to establish the playing power of a given network

Dynamic Thresholding: We must decide whether to allow resigning in early games since the network tends to resign often early on when untrained.

florian commented 6 years ago

Goals: Day 8 (7.3.18)

Python Daemon – Use capnp instead of JSON: The Python daemon should directly write the capnp messages into the hdf5 file instead of encoding them as JSON. This makes the file size a lot smaller.

Python Daemon – Refactoring: The code for the Python daemon should be refactored and properly merged into the git repository.

Small documentation tasks: The README should be changed to show all our dependencies, e.g. capnp.

Merge our code into master: First the master branch needs to be merged into our branch to get all the changes related to the config system. Then, we want to merge our capnp branch into the master branch.

Training process: The existing code needs to be changed to use hdf5 instead of numpy arrays. This is to avoid memory problems. The script also needs to split the data into training and test data.

Redistributed newly trained neural networks: If the training process generates a better neural network, then the selfplay program should use this new neural network. This means it needs to periodically check if there's a new neural network available.

Maybe: Integrate the training process into the Python daemon: The Python daemon should restart the training process from time to time. It's not clear whether we'll have time to do this today.

fraboeni commented 6 years ago

Results: Day 8 (7.3.18)

Python Daemon – Use capnp instead of JSON: The replacement of JSON by capnp files was implemented such that the size of the resulting files could be reduced a lot.

Python Daemon – Refactoring:: The daemon has been refactored and put into a skript. This can be found at https://github.com/nebw/argos-zero/tree/capnp/python/Collection-Daemon

Small documentation tasks: The documentation has been updated.

Merge our code into master: The current state of the code has been merged into the master (https://github.com/nebw/argos-zero/commit/b241eb3e16ec91cdcd560a8a4a7a8301855896de)

Redistributed newly trained neural networks: A bash skript that can be run on client side was implemented https://github.com/nebw/argos-zero/blob/capnp/test_networkdist/run_client.sh.

Training process: The neural network now work on the hd5 files. It took longer than expected due to unexpected errors. We are working on getting the parser to accept multiple hd5 files.

Integrate the training process into the Python daemon: Work on this task has started. So far, the logic has been implemented, but the actual implementation needs to be done based on the work of the evaluation group.

florian commented 6 years ago

Goals: Day 9 (8.3.18)

Training process: Debug the training script and fix some errors related to the parsing of the data.

Integrate the training process into the Python daemon: After training, the daemon should check if the new network is better than the old one using the code from the evaluation group. The training should be restarted once enough data is available.

Optional: Dynamic thresholding: Depending on how good the network is, we want to set some threshold for resigning. The idea is that we don't want to resign too often if we could still win the game, but at the same time we also don't want to spend too much time playing games that are already lost. We need to decide how to do this and depending on then will implement some solution.

fraboeni commented 6 years ago

Results: Day 9 (8.3.18)

Training process: The skript has been debugged and is now able to read the hd5 files and to trigger the training. It can also only add the new entries from selfplays https://github.com/nebw/argos-zero/blob/master/python/parse_data.py.

Integrate the training process into the Python daemon: The training file was altered to be called from from the supervisor skript. It returns a UU ID for the parameters of the network. The second part of the changes contained the supervisor skript. The evaluation of the other group was integrated to compare two networks. The supervisor can now compare two networks and write the name of the stronger one to the file from where the clients can fetch it.

Dynamic thresholding: The dynamic thresholding on daemon side has been implemented. Therefore, the capnp schema needed to be adapted. Two fields were added: one to store if resigning was possible during the game and a second one to store the win rates. These new fields were filled by an implementation in the selfplay and Collector code. Afterwards, a python skript was implemented that uses the games without resignation possibility and extracts the threshold for a given maximal false negative rate. This threshold can then be sent to the clients. The integration is planned for tomorrow https://github.com/nebw/argos-zero/blob/capnp/python/Dynamic_Thresholding/dynamic_thresholding.ipynb.

florian commented 6 years ago

Goals: Day 10 (9.3.18)

Supervisor debugging and refactoring: The script that checks if the retrained model is better than the old one still needs some debugging. The code should also be refactored.

Testing the supervisor and daemon connection: We need to make sure that the supervisor and daemon work well together. Depending on the outcome, some changes might be necessary.

Dynamic thresholding – integrating the Python part into the supervisor: The supervisor needs to write the result of the dynamic thresholding script to a file.

Dynamic thresholding – C++ integration: The result of the dynamic thresholding should be picked up by the C++ selfplay code.

Testing the training trigger and evaluation part: The relevant code still needs to be tested.

Convert Jupyter notebooks to Python scripts: Currently there are a bunch of Jupyter notebooks that should be converted into Python scripts.

chbrock commented 6 years ago

Results: Day 10 (9.3.18)

Supervisor debugging and refactoring: Jup. Most python code can now be imported from other scripts.

Testing the supervisor and daemon connection: The supervisor reads the files written by the deamon. To avoid conflicts, the active file (highest index) which is currently written by the deamon is not touched by the supervisor.

Dynamic thresholding – integrating the Python part into the supervisor: The c++ code in selfplay was adjusted to read resign threshold from config The supervisor now stores the dynamic threshold along with the networkname into a file which is read and passed by the bash script running the clients.

Dynamic thresholding – C++ integration: The result of the dynamic thresholding can now be set via the command line.

Testing the training trigger and evaluation part: The Code counts the number of games in HD5 files correctly and is able to start training automatically. All python scrips now work from the same list of files containing the training data. Other code still needs to be tested.

Convert Jupyter notebooks to Python scripts: Done