Hyperparameter Optimization using multiple GPUs on a single Host

PatternAlpha commented 3 years ago

I am currently tring to set up a Hyperparameter Optimization using multiple GPUs on a single Host. I followed and implemented this tutorial: https://keras-team.github.io/keras-tuner/tutorials/distributed-tuning/

The Optimization works as expected, but I can not distribute across multiple GPUs on a single Host using following Bash file:

export KERASTUNER_TUNER_ID="chief"
export KERASTUNER_ORACLE_IP="127.0.0.1"
export KERASTUNER_ORACLE_PORT="8000"
python hp_test.py  &> chief.txt & 
export chief=$!

export KERASTUNER_TUNER_ID="tuner0"
python hp_test.py  &> t0.txt & 
export t0=$!

while kill -0 $chief && kill -0 $t0 
do
    r=$'\r'
    now="$(date +'%Y-%m-%d %H:%M:%S')"
    printf "${r}${now}: Alive)"
    sleep 1
done

I have 3 questions: 1) Is my Bash-file wrong this is the reason why I can not start the optimization? 2) In issues 329`, it seems as if it is not possible to distribute hyperparameter optimization across multipe GPUs on one system using Keras-tuner. Is this correct? 3) if it is possible to distribute the optimization across multiple gpus on one system, are there any more in depth Tutorials on how to set this up. As fas as I can tell, you also need oracle but I couldnt find any documentation on how to set it up for multi-gpu distribution. (which dependencies, execution...)

Thank you very much!

PartiallyObservable commented 3 years ago

I would also appreciate an answer to this question. I have tried a similar script as @PatternAlpha and also cannot get parallel hyper parameter searching to work. I am working on a single machine with 64 cores and 4 GPUs

My python code for the chief process looks something like this:

    # before this, some setup to create keras dataset generators, train_gen, test_gen
    # tuner
    max_tune_epochs = 10
    hyperband_iterations = 10
    def build_model(hp): return behavior_cloner_hp(hp, train_gen.state_dim, train_gen.action_dim)
    tuner = kt.Hyperband(build_model,
                         objective='val_loss',
                         max_epochs=max_tune_epochs,
                         hyperband_iterations=hyperband_iterations)

    # early stopping callback
    early_stopper = EarlyStopping(monitor='loss', min_delta=0.0003, patience=10, restore_best_weights=True)

    # Perform search
    tuner.search(train_gen,
                 validation_data=test_gen,
                 epochs=max_tune_epochs,
                 callbacks=[early_stopper])

~~And then I run my python script. What I thought should happen:~~

~~1. The script enters the 'tuner.search' call and creates the 'chief' process~~ ~~2. I export a different TUNER_ID then run the script again (actually a modified script without the save model part, I thought that the 'chief' should be responsible for saving)~~ ~~3. After hyperband_iterations complete the cheif process and workers will end~~

~~What actually happens:~~

~~1. The script returns very quickly and saves a nonsense model and hyperparameters. My GPU/CPU load never really increases much and I don't think any training is happening at all~~ ~~2. I don't have a chance to start other worker processes~~

Without exporting the KERASTUNER environment variables, the script executes as expected and does single instance hyperparameter tuning. I am unsure what other parameters or settings I need to adjust to get the desired behavior.

I have potentially found a solution? My prior problem was resolved, I was specifically setting the distribution strategy to MirroredStrategy() but I guess we are not supposed to do that? After removing that I can now start the chief service and start a worker but still could not get multiple GPUs to be used.

In order to do that I have a bash script that executes this ('chief' input to script lets script know it is the chief and execute some extra steps after tuning)

export KERASTUNER_TUNER_ID="chief" 
export KERASTUNER_ORACLE_IP="127.0.0.1" 
export KERASTUNER_ORACLE_PORT="8000" 
python hp_tune.py chief

I then start multiple workers with specific CUDA visible GPU IDs with one bash script each like this :

export KERASTUNER_TUNER_ID="tuner0" 
export KERASTUNER_ORACLE_IP="127.0.0.1" 
export KERASTUNER_ORACLE_PORT="8000" 
CUDA_VISIBLE_DEVICES=0 python hp_tune.py worker

~~This seems to have the desired effect (multiple workers, multiple GPUs), but I am still unsure if this is the proper way to parallelize the tuner~~ It appears that the workers are not properly communicating with the chief? The behavior I am seeing is that the chief process starts normally, the first worker will train for 1 hyperband iteration and then exit, the second worker will just continue tuning indefinitely and the chief processes also does not terminate.

Thank you for any help! I find that this project is very helpful despite my issue. I believe single host multi gpu set ups are somewhat common, it would be great to have a tutorial explaining this use case :)

haifeng-jin commented 3 years ago

I am not sure if is supported. We will inspect if running multiple clients on the same machine would work by masking out different GPUs.

CarlPoirier commented 3 years ago

I need this as well!

fecet commented 2 years ago

Any progress?

BenK3nobi commented 2 years ago

I am stumbling on the same issue. I tried to get the multi GPU tuning on the same host running. Any news / advices so far?

zurKreuzigungLinks commented 1 year ago

same here

Potato-Waffles commented 1 year ago

I've managed to get multiple GPUs on single system to work with the Keras example, by having the code be "aware" if other instances are running - the "chief" (first instance) creates a temp.txt file containing -1, then the other instances read it and increase the number by one and use that number to select the GPU. It also allows for more instance to be spawned, as the number is limited to the number of GPUs. I run it by having multiple terminals open.

However, the GPUs train in a sequential manner - first GPU trains, then the second GPU, then the first GPU and so on. I am not sure why. And this is why I am here. :(

My best guess - the tuner instances block the port while running and thus - the sequential behavior. Maybe using 2 virtual machines will resolve this - one for every GPU? I don't have the time to test that right now.

Here is the code, if anyone wants a crack at this:

try:
    f = open("temp.txt", "x")    

    f.write("-1")
    f.close()

    physical_devices = tf.config.list_physical_devices('GPU')
    tf.config.experimental.set_visible_devices(physical_devices[0], 'GPU')
    tf.config.experimental.set_memory_growth(physical_devices[int(i)], True)

    print("===Master starting on GPU_0")

except:
    physical_devices = tf.config.list_physical_devices('GPU')

    f = open("temp.txt", "r")   
    i = f.read()
    i = str((int(i) + 1)%len(physical_devices))
    f.close()

    f = open("temp.txt", "w")  
    f.write(i)
    f.close()

    tf.config.experimental.set_visible_devices(physical_devices[int(i)], 'GPU')
    tf.config.experimental.set_memory_growth(physical_devices[int(i)], True)

    print("===Master runing, creating slave on GPU_"+i)

keras-team / keras-tuner

Hyperparameter Optimization using multiple GPUs on a single Host #413