Open chrispsn opened 6 years ago
hi, as roy7 said on discord, we sort of use this discord like a forum
before answering your questions, the general idea i want to say is your cloud training should be efficient, you dont have to run after the last 0.0001$, but rather globally make it hassle free and max cost efficient
on google cloud, the tesla v100 with preemptibility and autogtp -g 2 is the most cost efficient solution (most games/dollar), i did many tests to come to that conclusion for example, on microsoft azure, it's actually the tesla P100 that is much more cost efficient, as i said here : https://github.com/gcp/leela-zero/issues/1905#issuecomment-433710281
then to answer you :
for 1. google cloud free trial has a GPU quota of 1, so you wont be able to run more than one instance that has a GPU simultaneously
for 2. i dont know about that, you'd have to ask @alreadydone maybe, but it is not going to save much again, as a Tesla V100 takes 4.5 minutes to produce one game (8-9 minutes to produce 2 games with -g 2)
dont hesitate to correct me if you have a different opinion, this is just my thoughts
to increase the number of contributing cloud users, our best shot is to spread more the cloud instructions on social networks i think
edit : also, on page 9 of the google doc, this is the entirely automated script we're using :
https://github.com/gcp/leela-zero#using-a-cloud-provider
#!/bin/bash
PKG_OK=$(dpkg-query -W --showformat='${Status}\n' glances|grep "install ok installed")
echo Checking for glanceslib: $PKG_OK
if [ "" == "$PKG_OK" ]; then
echo "No glanceslib. Setting up glanceslib and all other leela-zero packages."
sudo apt-get update && sudo apt-get -y upgrade && sudo apt-get -y dist-upgrade && sudo add-apt-repository -y ppa:graphics-drivers/ppa && sudo apt-get update && sudo apt-get -y install nvidia-driver-410 linux-headers-generic nvidia-opencl-dev && sudo apt-get -y install clinfo cmake git libboost-all-dev libopenblas-dev zlib1g-dev build-essential qtbase5-dev qttools5-dev qttools5-dev-tools libboost-dev libboost-program-options-dev opencl-headers ocl-icd-libopencl1 ocl-icd-opencl-dev qt5-default qt5-qmake curl && git clone https://github.com/gcp/leela-zero && cd leela-zero && git submodule update --init --recursive && mkdir build && cd build && cmake .. && cmake --build . && cd ../autogtp && cp ../build/autogtp/autogtp . && cp ../build/leelaz . && sudo apt-get -y install glances zip && sudo apt-get clean && sudo reboot
else
sudo -i && cd /leela-zero/autogtp && ./autogtp -g 2
fi
Thanks for writing your guide and doing the efficiency tests! They're really good instructions, I've been using them for around a week.
Agree that:
I also agree it's important to have the latest software, but if we want this to be set-and-forget, would it also be good for the installation instructions to be updated automatically? If so, one way to achieve that is for your guide to point to a image that's refreshed every 24 hours, particularly given the instances are likely to be destroyed within 24 hrs anyway. (Another way could be to host the latest script in a GitHub gist, download it to the instance, and run it. Should also flag I have no idea how hard it is to create images for Google's service, I'm only speaking from experience with using Docker.)
On your responses to my questions:
Let me put the question another way: assuming both are using the latest files, how big are the speed disadvantages of using pre-compiled files or an image, as opposed to compiling from scratch?
Agree, I had in mind cases where the instance gets stopped after less than an hour (more applicable to other cloud services).
I don't think security is an issue because:
this is going to take some time for me to answer
@wonderingabout That script didn't work for me without some modification:
"sudo add-apt-repository -y ppa:graphics-drivers/ppa"
This doesn't work in the vanilla Ubuntu 18.04 install without first running "sudo apt-get install software-properties-common"
@nathanloop surprising, i tried it last week (at the release of leela zero v16/autogtp v17) and it was working on google cloud
i'll try it another time (atm busy with azure cloud) and let you know if it works
- However, while Windows binaries are distributed, Linux binaries aren't: we're told to compile from source. Even the 'official' Dockerfiles contain compilation steps. Why?
It is a long tradition with *nix systems based on practical engineering, trust, and incompatbility between different distributions and versions. I might not have the same Ubuntu version, I might have a different desktop on my Ubuntu as you, or might not even have a desktop, or I might not be using a non-Ubuntu Linux, and I might not trust a random binary.
Usually, users will only trust the binaries sent out by the distribution itself in the package manager, and will build everything else from source.
Thanks. Is there a difference in processing (game generation) speed from compiling afresh each time instead of using a "one-size-fits-all" binary?
Theoretically building from source will give you an optimised build for your system. In practise, this might or might not occur.
As an example of when the theory and practise conform, the standard build of the Python interpreter is much slower than the one that ships with Ubuntu, due to Ubuntu-specific optimisations.
@wonderingabout my mistake. I was 18.04 minimal, normal 18.04 and it works fine.
Hi, apologies if this is not the right place to ask (came here via the Discord chat):
I'm hoping to put together something that will scale up by 10x+ the number of games generated for training this AI.
One promising angle is using the 'bottom of the barrel' of spot cloud prices. New instances can be stopped less than an hour after creation, so I need to minimise their spin-up time. Two questions:
Thanks in advance.