tensorflow / minigo

An open-source implementation of the AlphaGoZero algorithm
Apache License 2.0
3.47k stars 558 forks source link

Is it possible to publish the MG selfplay at Minigo-pub Google Cloud? #835

Closed SHKD13 closed 4 years ago

SHKD13 commented 5 years ago

Hi! Is it possible to publish the MiniGo v17 selfplay for downloading it from Minigo-pub page at Google Cloud? Maybe the last few millions selfplays, and for MiniGo v15 and v16 too if possible. Any quality is suitable: with debug information or without it, no difference. Thanks!

tommadams commented 5 years ago

The debug SGF files are quite large, but I'm happy to take a crack at publishing some of the later selfplay games without debug info.

tommadams commented 5 years ago

I've copied SGFs of the selfplay games from the final hour of v17 training to gs://minigo-pub/v17-19x19/sgf/clean/2019-02-26-00

I've compressed the games into multiple archives: nr.tar.gz holds all the games for which resignation was disabled, the other *.tar.gz files contain regular selfplay games.

It's only ~53,000 games but let me know if the data looks good (I've only made a cursory check myself) and I can upload some more.

SHKD13 commented 5 years ago

@ tommadams Thanks a lot! I'll check the files as soon as possible))

tommadams commented 5 years ago

No problem. Keep in mind that we were playing around with various hyperparameters for the last day of v17's training.

SHKD13 commented 5 years ago

@ tommadams Selfplays look good! Would be great to have a bit bigger MG training dataset :)

alreadydone commented 5 years ago

Including the policy training target would be helpful.

tommadams commented 5 years ago

Cool. I'll start exporting more SGFs then.

All our training data is stored in Cloud BigTable and we haven't yet got a good way of exporting that data yet, sorry.

tommadams commented 5 years ago

I've put some more games up: gs://minigo-pub/v17-19x19/sgf/clean/

SHKD13 commented 5 years ago

@tommadams Very nice to see more games! Thanks a lot :)

SHKD13 commented 5 years ago

@tommadams Hey, may I ask is this the full MiniGo v17 training dataset? https://console.cloud.google.com/storage/browser/minigo-pub/v17-19x19/data/golden_chunks/?pli=1 Or something else?

tommadams commented 5 years ago

That is not the full training dataset. Since v14 onwards, we stored the selfplay data in a Cloud BigTable and randomly sample the training examples from during training. Since the examples are sampled randomly and sent directly from to the TPU for training, we don't actually have the exact set of examples used to train the model.

The data that I extracted for you was a random sampling of examples from the complete data set that covers all generations of the model.

SHKD13 commented 5 years ago

@tommadams Thank you very much for the data! So, if i get the clue correctly, this link https://console.cloud.google.com/storage/browser/minigo-pub/v17-19x19/data/golden_chunks/?pli=1 Leads to the training samples of MG v17 selfplay dataset which can be directly feed to the agent, unlike sgf files itself?

sethtroisi commented 5 years ago

@SHKD13 Yes, the golden chunks are tf.examples which can be feed to the agent (dual_net.py) for training.

bubblesld commented 5 years ago

@sethtroisi The training you mentioned is in minigo format or lz? I also want to know how to deal with .zz file. Are they some compressed files? Thanks.

sethtroisi commented 5 years ago

@bubblesld

it's in minigo format (tf.train.examples) you can see an example of reading it here https://colab.research.google.com/drive/1Sx6WTphh5f8oKVhWJB7SenYsDkG5ytiN

or in offones (e.g. https://github.com/tensorflow/minigo/blob/master/oneoffs/compare_examples.py)

bubblesld commented 5 years ago

@sethtroisi Thanks for your information. I would like to include minigo self-play games in training LZ weights. Last time, when we include elfv2 to the training data, v222 learned 羋氏飛刀 (sorry, I do not know its translation in English), and it is a key reason that LZ beat Golaxy once in the previous tournament. We hope that LZ can learn something from minigo as well.

I know little in linux. I think that the self-play games @tommadams mentioned should be good to use, but I do not know how to turn them into lz training format. I learned that the command is dump_supervised sgffile.sgf train.txt But I could not do it for 53k times. Can anyone teach me how to read all the files and dump_supervised each?

alreadydone commented 5 years ago

You can do copy * 2.sgf to merge all files under current directory into one file. Then you can dump_supervised 2.sgf train in leelaz. lz.zip is the 2.sgf and training data gz files I got from 2.tar.gz in https://console.cloud.google.com/storage/browser/minigo-pub/v17-19x19/sgf/clean/2019-02-26-00

alreadydone commented 5 years ago

By the way, does Minigo v17 actually used scale = True in the residual layers? The current code says so: https://github.com/tensorflow/minigo/blob/d0f6a7dae7f80a3e573346457d64fc1afd4a0671/dual_net.py#L397 I think if scale = False was used then it's possible to apply convert_minigo.py, net_to_model_py, and parse.py (with the slight modification https://github.com/leela-zero/leela-zero/pull/1782/files of tfprocess.py) to a v17 model to continue training it in LZ format, now that https://github.com/leela-zero/leela-zero/pull/2370 is ready.

sethtroisi commented 5 years ago

@alreadydone I believe that convert_minigo.py was fixed with https://github.com/leela-zero/leela-zero/pull/2133

bubblesld commented 5 years ago

@alreadydone Thanks. I will try it.

Since there is only 53k self-play games in the link tommadams mentioned, it is probably too few for training. We used 400k lz + 250k elfv2 (or elfv1, elfv0) earlier. Can I convert the files sethtroisi mentioned into .sgf (then convert to lz format)?

alreadydone commented 5 years ago

@sethtroisi I believe that that old PR doesn't include any support for SE. I am specifically talking about continuing training v17 nets with LZ code. @Ttl mentioned in https://github.com/leela-zero/leela-zero/pull/1782#issuecomment-416303807 that one (probably the only) obstruction that we can't train ELF nets with LZ code is the gammas being used in residual layer batchnorm, which correspond to the scale = True (In tfprocess.py in master branch , scale is False for both residual layers and the two heads, while in latest Minigo code it's False in the two heads but True in SE residual layers). However it seems that @Ttl already fixed this in his PR https://github.com/leela-zero/leela-zero/pull/2370/files#r280738306. Now scale is False in the first layer of each residual block and True in the second layer (the same PR also contains updated convert_minigo.py and net_to_model.py). Does that mean Minigo v17 nets (and possibly also ELF nets) can now be trained directly with code in that PR without much modification?

bubblesld commented 5 years ago

Another questions, to train a net with larger block, can I use a lower block net + some identity block? For example, can I use LZ-40b + 40 identity bloacks as the initial for 80b? This is the same as the original 40b net? If yes, I think that it is a good initial to start with. At least, it have the same strength, and it has rooms for improvement. If yes, how to write a code to create this net? Thanks.

alreadydone commented 5 years ago

You could try net2net.py in LZ training code. Append the number of blocks and the number of filters to be added to the weight file name, like python net2net.py 226.gz 40 0, if I recall correctly.

bubblesld commented 5 years ago

You could try net2net.py in LZ training code. Append the number of blocks and the number of filters to be added to the weight file name, like python net2net.py 226.gz 40 0, if I recall correctly.

Thanks. I will try it as well.

tommadams commented 5 years ago

I recently uploaded 123GB of compressed training examples from our v17 run: gs://minigo-pub/v17-19x19/data/golden_chunks

bubblesld commented 5 years ago

@tommadams Thanks. I thought that they are not .sgf files. But I still have trouble in opening .zz files. sethtroisi mentioned two ways to read them. How to read them and save them to .sgf files? I found that .zz is Zzip? I tried to uncompress .zz files, but not successful.

tommadams commented 5 years ago

Those files are the actual training examples, not .sgf files.

Each file contains a list of compressed TensorFlow examples. Seth's earlier comment contained links to code that shows how to read these files: https://github.com/tensorflow/minigo/issues/835#issuecomment-496819491

The example code decompresses the examples automatically, so you don't need to decompress them yourself.

bubblesld commented 5 years ago

@tommadams Thanks. I will download the code to try it.

SHKD13 commented 5 years ago

I'd like to ask one thing. Why no one ever used PhoenixGo v1 for generating the high quality selfplay, as JCP did with ELF last year? Neither Leela Zero nor MiniGo team. Even for evaluation games. It works fairly well in LZ engine + PhoenixGo converted weights format. It shows really good results in matching the AlphaGo Zero moves even by using the old modified LZ 0.15 engine. I think, the new LZ 0.17 can make PhoenixGo even stronger. Why not?

alreadydone commented 5 years ago

There wasn't much interest in the PhoenixGo model because it came after ELF v0 but wasn't as strong as it (there must have been more tests than https://github.com/leela-zero/leela-zero/issues/1405#issuecomment-388568293, but maybe they aren't published on GitHub), despite being of slightly bigger size. I'd like to see data showing that PhoenixGo matches AGZ moves better. Which of the AGZ games in https://www.alphago-games.com are you referring to?

In order to do self-play, leela-zero must make an official release that supports both PhoenixGo and LZ weights, but because of lack of interest, no one bothered to modify the code and submit a PR to support both formats.

alreadydone commented 5 years ago

@sethtroisi Questions regarding https://colab.research.google.com/drive/1Sx6WTphh5f8oKVhWJB7SenYsDkG5ytiN, just to confirm: 17th plane = 1 means black is to move at current position, outcome = 1 means black won, right?

SHKD13 commented 5 years ago

@alreadydone Thanks for reply! But really strange to hear that PhoenixGo was considered as a weaker AI than ELF v0. During the high level playouts (over 50K) OGS tournament In february 2019 PhoenixGo has kicked the asses of ELF v2, LZ #205 and MiniGo v15. You can find the sgf's on OGS. If it is strong enough to beat much newer Networks, how can it be weaker than the oldest version of ELF?

Here is the sample of a small research "Comparison of AlphaGo Zero (both 20 & 40 residual blocks sized) with its Deep Reinforcement Learning Clones, Based on Matching the Samples of AlphaGo Zero’s Moves":

Screenshot_1

PhoenixGo matches AlphaGo Zero 20B moves better than any version of ELF OpenGo in this article. These two things drive me to ask the question about ignoring the PhoenixGo training potential last year. Of course, I might be completely wrong :)

tommadams commented 5 years ago

@sethtroisi Questions regarding https://colab.research.google.com/drive/1Sx6WTphh5f8oKVhWJB7SenYsDkG5ytiN, just to confirm: 17th plane = 1 means black is to move at current position, outcome = 1 means black won, right?

Correct

tommadams commented 5 years ago

Thanks for reply! But really strange to hear that PhoenixGo was considered as a weaker AI than ELF v0. During the high level playouts (over 50K) OGS tournament In february 2019 PhoenixGo has kicked the asses of ELF v2, LZ #205 and MiniGo v15. You can find the sgf's on OGS. If it is strong enough to beat much newer Networks, how can it be weaker than the oldest version of ELF?

It's probably worth mentioning that we have never entered Minigo into a tournament, so that entry wasn't "official" ;)

SHKD13 commented 5 years ago

@tommadams Yes, that was some game enthusiast with a hardware good enough to run LZ 0.16 engine + converted MG v15 weights for 60k vs 60k playouts online battle. PhoenixGo performed surprisingly strong that time, especially for his age.

Anyway, I hope @bubblesld will find the way to use MG v17 selfplay and build some interesting hybrid models while we waiting for the enter of MiniGo v18 :)

alreadydone commented 5 years ago

@SHKD13 Nice to see the data, but did you only test for 20 moves (that you chose)? Maybe one from each game of AGZ 20B vs. AG Lee, or else?

I think you are talking about the PhoenixGo matches in this thread https://github.com/leela-zero/leela-zero/issues/2237, and not the OGS tournament https://online-go.com/tournament/44207. However, it seems that the earlier favorable results for PG were due to incorrectly setting enable background search: 1 when testing both AI on the same machine. "Background search" in PhoenixGo is called "pondering" in LZ, and when it's enabled PG will continue searching when LZ is generating its move, so PG will accumulate a large number of playouts and when it starts generating its move, it adds another 60k playouts to the tree, so PG was getting much more than 60k playouts in the earlier games, which was not fair to the LZ engine (whether it's running LZ, Minigo, or ELF weights). See: https://github.com/leela-zero/leela-zero/issues/2237#issuecomment-467565132 https://github.com/leela-zero/leela-zero/issues/2237#issuecomment-467595716 https://github.com/leela-zero/leela-zero/issues/2237#issuecomment-470186238

I found that using "native background search 1" will affect the efficiency of the other AI in the same game, so I canceled it from March.

https://github.com/leela-zero/leela-zero/issues/2237#issuecomment-470425880 "i thought you were using pondering for lz too" But @atoutw was actually always using --noponder for LZ even before March: https://github.com/leela-zero/leela-zero/issues/2216#issuecomment-464120226

In the later games, PG with 90k sims lost 2-3 to LZ # 208 and 0-4 to # 209 both with 60k sims. (According to https://github.com/leela-zero/leela-zero/issues/2237#issuecomment-470544437, the PG engine would use 123s per move and LZ < 77s, but the LZPG engine should be faster).

SHKD13 commented 5 years ago

@alreadydone I didn't know that PG background search was enabled. Thanks for info! Anyway, even 2 to 3 mini match with much yonger LZ 208 is a fairly well result for ELF v0's same age AI. It is still competitive no matter how much stronger PG became now, in its recent iterations which Tencent is hiding from Go community. By the way, I've been asking them for PhoenixGo training dataset, but the answer was "Hell, no! :)", no matter that PG is a nominal open source project.

Few words about AlphaGo Zero comparison. This is my "beta test" analysis. I'll finish it soon with more comprehensive stats and illustrations. I took 20 non-forced looking moves from 4 last games of each AGZ 20B and 40B models against itself, 40 moves in general. All moves are beyond 30 first ones to avoid any distortion by randomization argument -m 30 (obviously, some games are evaluation tests but some are the training ones with a noisy openings). All AI's, except PG and MG v17, are driving by newest LZ 0.17 engine, which is making a significant difference compared to its 0.16 predecessor. I'd like to analyse the AG Zero moves from its evaluations versus AG Master too, but there are no simulation numbers per move in AGZ paper for these games, only two hours time limitation pointed out. I'm sure there can be NO correct comparison with visits count way higher or lower than in original research. That's not the way to check or reproduce it.

My hypothesis was a more matches for newer & stronger AI's like MiniGo v15 or v17 (v17 is my faforite one), as well as freshest iterations of LZ and ELF. But some results already are out of my expectations, and that's nice :)

alreadydone commented 5 years ago

@tommadams Thanks! LZ instead uses 0 for 'black to move'. @bubblesld I modified the sample code for reading Minigo training data into a converter into LZ format https://drive.google.com/file/d/1L9AGDxgmIjFNIJt599nTMjmsdNJYlKdr/view?usp=sharing https://colab.research.google.com/drive/1L9AGDxgmIjFNIJt599nTMjmsdNJYlKdr I haven't found a good way to verify correctness though. On Google Colab, each position takes ~1.6 ms to convert (but 10 ms on my notebook), and a whole chunk (~2,000,000 positions) should take < 1 hour to convert. It's slow but introducing the function to_hex reduced runtime by ~7.5x (compared to bitwise operations) and I'm not sure how to further optimize it. Change train_0007470000_0000660016.tfrecord.zz to other file name under the same directory to convert other chunks. @tommadams Which ones are generated by the latest models? Since I have trouble automatically downloading generated chunks of LZ training data (6,000 positions each) from Google Colab, so you'd better run it locally.

SHKD13 commented 5 years ago

Sorry but can anyone explain a little bit more how to turn into sgf those MG v17 selfplay files, which have been converted from MiniGo to Leela Zero format?

bubblesld commented 5 years ago

@tommadams Thanks! LZ instead uses 0 for 'black to move'. @bubblesld I modified the sample code for reading Minigo training data into a converter into LZ format https://drive.google.com/file/d/1L9AGDxgmIjFNIJt599nTMjmsdNJYlKdr/view?usp=sharing https://colab.research.google.com/drive/1L9AGDxgmIjFNIJt599nTMjmsdNJYlKdr I haven't found a good way to verify correctness though. On Google Colab, each position takes ~1.6 ms to convert (10 ms (!) on my notebook), and a whole chunk (~2,000,000 positions) should take < 1 hour to convert. It's slow but introducing the function to_hex reduced runtime by ~7.5x and I'm not sure how to further optimize it. Change train_0007470000_0000660016.tfrecord.zz to other file name under the same directory to convert other chunks.

Thanks. I will try it.

alreadydone commented 5 years ago

Sorry but can anyone explain a little bit more how to turn into sgf those MG v17 selfplay files, which have been converted from MiniGo to Leela Zero format?

SGFs can't be recovered; you'd better have @tommadams upload them. The training data of each position only contains information about the locations of black/white stones (at the current position or right before the previous seven moves), the side to move, the normalized visit counts, and the winner of the game. Each position is followed by a random position, almost surely from another game. The move from a position can be recovered as the highest visited one. The move is usually the highest visited one, but if it's within the first 30 moves it's randomly chosen, so you can't piece them together trivially, and if you do so you just get a tree of positions instead of a disjoint collection of paths along the game that you can each put into a SGF.

tommadams commented 5 years ago

It's not possible to generate the SGFs from the training examples. Due to the way we randomly sample training examples from selfplay, there's no guarantee that all the moves from each game are in the training data.

Why do you want the SGFs by the way?

SHKD13 commented 5 years ago

@tommadams Yeah, I got it. That was another idea how to adopt MG v17 training examples for feeding the hybrid LZ bot. Those cross-selfplay models are always interesting and strong. Maybe MG team will try something like this during the running of MiniGo's next versions?

amj commented 5 years ago

to put a little clarity on this thread, i thought i'd list what kinds of game data we have and what we've already made public, & what i'd like us to eventually make public.

Each run has:

  1. "Evaluation games" -- sgfs with debug information of the matches used to determine the strengths of our models. These are all public and there are good viewers on cloudygo.com.
  2. Selfplay games as sgf
    • "clean" sgf records, with the debug information stripped (~1kb each)
    • "full" sgf records, with the debug information present in sgf comments (~500kb each)
  3. Selfplay games as training examples. These are serialized TFExample objects, one per game. These are files on GCS, each game is about 50kb-100kb. After v14, these were also written to a Cloud Bigtable so they can be efficiently read by a Cloud TPU.
  4. For runs prior to v14, we made aggregated "golden chunks", which are aggregates of the training examples that have already been shuffled and sorted according to the methodology described in the paper, i.e., the "positions uniformly sampled from the most recent 500k games...". Each golden chunk is numbered according to the model it produced. I.e., Model 499 trains on golden chunk 500 and the new checkpoint is model 500.
  5. Model checkpoints -- which are the models themselves. most of them are either available as the "frozen graph", i.e. a .pb file, or the index/meta/data format of a tensorflow checkpoint, which may be the worst way to persist these artifacts possible.
  6. tensorboard logs which are really inefficiently stored; each one basically writes the whole graph over again.

On top of all of the above for each run, we also have Cross-run evaluation games

The "debug information" in the sgf records is a human-readable dump of the search information, so humans can look at the PV and see how the two engines thought they were doing. It's not usable directly as training examples, although they could be turned into limited training examples with some scripts; you could convert the tabular data into the new policy targets, but it's only the policy target for the most visited nodes, not a full list of all 361. That might actually be fine, but either way you'd need to do that for all the hundreds of millions of sgfs...

So of the list above, we've only published the models and the evaluation games. I'm happy to publish the other data but it's not clear that it's any use to anyone. It's also not like i have them in a convenient form; sync'ing tens of millions of tiny files in order to tarball them and re-upload them is not something i want to do until it looks like someone needs them :)

The selfplay games as sgfs would be pretty interesting to do a large scale analysis on, but they're fairly big and a lot of them have the sort of randomness that makes them look really funny to people (the 30 move temperature thing). V17 has 25M games; 25M * 500kb ~= 10TB+ of full selfplay games.

The training examples in bigtable for v14 & up are ~15TB in total, each run is a few TB. These could probably be exported from bigtable, or the .tfrecord.zz files could be sync'd and gzipped, but see above -- i'd want to be sure they wouldn't just disappear into someone's totally closed-source franken-bot, for instance, which is something i know leela has been dealing with.

Hope this helps. I'd love to find the best way to get the useful parts of this data into the go community in useful formats, so if anyone has any ideas on that front feel free to open some issues for us :)