googledatalab / datalab

Interactive tools and developer experiences for Big Data on Google Cloud Platform.
Apache License 2.0
974 stars 249 forks source link

Datalab is not responding after machine learning model training #2161

Open OrielResearchCure opened 4 years ago

OrielResearchCure commented 4 years ago

Hello,

I am using datalab machine 32 or 64 cores to train a model.

code looks like that: a keras model execution:

model_history = model.fit_generator(train_gen,
                                                steps_per_epoch = 30,
                                                epochs = EPOCHS,
                                                validation_data = validation_gen,
                                                validation_steps = 20,
                                                callbacks = [early_stopping,tensorboard_callback],
                                                class_weight=class_weight)

The training is running fine. the early stopping is enforced:

30/30 [==============================] - 71s 2s/step - loss: 0.0784 - tp: 863.0000 - fp: 484.0000 - tn: 7045.0000 - fn: 60.0000 - accuracy: 0.9356 - precision: 0.6407 - recall: 0.9350 - auc: 0.9835 - val_loss: 0.0501 - val_tp: 500.0000 - val_fp: 20.0000 - val_tn: 4060.0000 - val_fn: 0.0000e+00 - val_accuracy: 0.9956 - val_precision: 0.9615 - val_recall: 1.0000 - val_auc: 1.0000
Epoch 00010: early stopping

Once this is completed. the machine is disconnected. The only way for me to access the machine is restarting it. Connection trials error is 504 gateway time out

My questions are:

  1. What might be causing the disconnect. If possible, I rather keep on using the datalab for models training.
  2. I run the following installations at the beginning of the execution:
    
    !pip install tensorflow==2.0.0b0 -q
    !conda install -y -c anaconda numpy
    !conda install -y -c anaconda seaborn

Where will be the right way to include them so the machine will be already installed with these libraries when I create the machine or connect to it.

Many thanks for any advice,
eilalan