Open HunbeomBak opened 4 years ago
Hey @HunbeomBak
Alala... we have to fix it because the multi_gpu_model
function is deprecated on 1 April, more details here.
As you can see in this link, we have to use the MirrorStrategy in our case. If you want to help, please make the Pull Request. Otherwise, the hotfix is to create the model within the context of the strategy. Please read the detailed tutorial directly on the TensorFlow page here
Please tell me if you can fix it.
Hi @rolczynski Seems that the usage of target_tensors option when compiling the model is not supported in tf MirrorStrategy.
Do you have any idea how can usage of target_tensors option can be avoided and model mapped to the right target automatically?
In my experiments, just disabling target_tensors option throws a ctc loss related error when compiling the model. This is the snippet I am mentioning:
def compile_model(self):
""" The compiled model means the model configured for training. """
y = keras.layers.Input(name='y', shape=[None], dtype='int32')
loss = self.get_loss()
self._model.compile(self._optimizer, loss, target_tensors=[y])
logger.info("Model is successfully compiled")
It's quite a big change. As we can see, things tend to be more and more complicated when we try to stick with the "functional API". We wish to build models effortlessly so I think we should do it by subclassing the tf.keras.Model
class.
Hello, i want to train my dataset, and i have two gpus.
bellwo is code.
`pipeline = asr.pipeline.CTCPipeline( alphabet, features_extractor, model, optimizer, decoder, gpus=['gpu:0','gpu:1'] ) dataset =pipeline.wrap_preprocess(dataset, False, None) dev_dataset =pipeline.wrap_preprocess(dev_dataset, False, None)
y = tf.keras.layers.Input(name='y', shape=[None], dtype='int32') loss = pipeline.get_loss() pipeline._model.compile(pipeline._optimizer, loss, target_tensors=[y]) pipeline._model.fit(dataset,validation_data=dev_dataset,epochs=100) pipeline._model.save(os.path.join('/checkpoint', 'model.h5'))`
But, model use only one gpu.
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.44 Driver Version: 440.44 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 TITAN Xp Off | 00000000:67:00.0 Off | N/A | | 48% 78C P2 220W / 250W | 11861MiB / 12196MiB | 86% Default | +-------------------------------+----------------------+----------------------+ | 1 TITAN Xp Off | 00000000:68:00.0 On | N/A | | 27% 45C P8 12W / 250W | 574MiB / 12194MiB | 0% Default | +-------------------------------+----------------------+----------------------+
It seems that an OOM occurs when the batch size is increased.