lightvector / KataGo

GTP engine and self-play learning in Go
https://katagotraining.org/
Other
3.56k stars 564 forks source link

When will katago's new weights come out #318

Open Harder-Run opened 4 years ago

Harder-Run commented 4 years ago

Ask lightvector,when will katago's new weights come out?

Friday9i commented 4 years ago

To get new weights, it takes at least weeks of training with dozens of strong GPUs, and even months if the run starts from scratch. So it takes time and money... Good news is, lightvector and some other contributors are working on a distributed KataGo, such as LZ, where end users like us can contribute with our GPUs to generate selfplay games. Hopefully it'll be ready soon and we'll get many contributors, so we'll get new more powerful nets regularly :-)

Harder-Run commented 4 years ago

OK.

portkata commented 4 years ago

With the distributed katago, would it be possible to train the s167 15b using the s509 40 block? or would only self training be possible?

lightvector commented 4 years ago

You can already train it if you like. :)

The s167 15b training was discontinued long before the end of the run because it seemed to be not improving very much any more and to save on computation power and mental energy of managing it. It would likely get a little bit stronger if trained on the most recent data, so you don't even need to self-play. Just download the data and train it some more if you like. https://d3dndmfyhecmj0.cloudfront.net/

Talk to @Friday9i if you want to confer with someone who already managed to set up some home training with existing downloaded data (although for a slightly different purpose).

portkata commented 4 years ago

Thanks! @Friday9i I only have 1 v100, will that be enough to do this type of training? Do I need to run the shuffler, gatekeeper and/or exporter? Can I delete any lines in this? - https://github.com/lightvector/KataGo/blob/master/python/selfplay/synchronous_loop.sh

Friday9i commented 4 years ago

@portkata I answered a similar question on cbaduk a few hours ago, and just saw this question here: are you the same person? Here it is: https://www.reddit.com/r/cbaduk/comments/j5wrqt/need_help_on_how_to_train_stronger_15b_katago_net/ One V100 allows to train a net, no problem, but you may need it for weeks at least (and possibly months) if you want to significantly improve the 15b net. And as said by lightvector, the 15b was apparently close to its asymptote, so you will probably not get a lot of improvement.

portkata commented 4 years ago

Thanks so much! Yeah same person :)

On Thursday, October 8, 2020, Friday9i notifications@github.com wrote:

@portkata https://github.com/portkata I answered a similar question on cbaduk a few hours ago, and just saw this question here: are you the same person? Here it is: https://www.reddit.com/r/cbaduk/comments/j5wrqt/need_ help_on_how_to_train_stronger_15b_katago_net/

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lightvector/KataGo/issues/318#issuecomment-705496892, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOOHWOY7L2GYZ5S5CMVLYYTSJWMGNANCNFSM4RS7OHNQ .