jonathan-laurent / AlphaZero.jl

A generic, simple and fast implementation of Deepmind's AlphaZero algorithm.
https://jonathan-laurent.github.io/AlphaZero.jl/stable/
MIT License
1.23k stars 136 forks source link

Duel between Full of same network arch with different vectorize_state implementations #146

Open gwario opened 1 year ago

gwario commented 1 year ago

Hi!

I am trying to run a benchmark or any comparative format between two AlphaZero players which may have the same network architecture but with different implementations of the GI.vectorize_state.

I looked at the code and could not come up with a solution.

Thus far i tried to sub type the network or the gspc but that (i think) would not work because they have to share the same gspec/ the benchmark works on a single game spec. Sub typing the network does not work because i couldnt figure out how make the type hierarchy work since i couldn't inherit from structs.

I also tried to clone the play_game function and switch the implementation via eval but then i ran into problems, probably because the inference server does the evaluation but async to the loop in play_game.

What do you think is my best shot to accomplish duel a benchmark in this scenario? Did i overlook something?

Thanks a lot! gwario.

jonathan-laurent commented 1 year ago

Sorry for my late reply. Have you found a solution to your problem?

You are correct that making vectorize_state a property of a GameSpec makes it pretty hard to benchmark different state vectorization schemes. This is a questionable design decision since different network architectures frequently mandate different vectorization schemes anyway.

Right now, the best workaround is probably to add a flag to game specs and states that specifies how they should be vectorized and pass the right flag in the right context.

If you have suggestions in how to make the next design better, I would be glad to consider it.

gwario commented 1 year ago

Hi!

I went for a network solution (which is not yet working properly) using redis for game env synchronization and game play process control

That way i have completely separate julia processes doing the work. The solution is not as fast i think but this will be less of a problem since the mcts iterations will probably be the limiting factor.

I will try to summarize my experience for you when i'm wrapping up my work.