rolczynski / Automatic-Speech-Recognition

🎧 Automatic Speech Recognition: DeepSpeech & Seq2Seq (TensorFlow)
GNU Affero General Public License v3.0
223 stars 62 forks source link

TensorFlow multi_gpu_model function is deprecated #24

Open HunbeomBak opened 4 years ago

HunbeomBak commented 4 years ago

Hello, i want to train my dataset, and i have two gpus.

bellwo is code.

`pipeline = asr.pipeline.CTCPipeline( alphabet, features_extractor, model, optimizer, decoder, gpus=['gpu:0','gpu:1'] ) dataset =pipeline.wrap_preprocess(dataset, False, None) dev_dataset =pipeline.wrap_preprocess(dev_dataset, False, None)

y = tf.keras.layers.Input(name='y', shape=[None], dtype='int32') loss = pipeline.get_loss() pipeline._model.compile(pipeline._optimizer, loss, target_tensors=[y]) pipeline._model.fit(dataset,validation_data=dev_dataset,epochs=100) pipeline._model.save(os.path.join('/checkpoint', 'model.h5'))`

But, model use only one gpu.

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.44 Driver Version: 440.44 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 TITAN Xp Off | 00000000:67:00.0 Off | N/A | | 48% 78C P2 220W / 250W | 11861MiB / 12196MiB | 86% Default | +-------------------------------+----------------------+----------------------+ | 1 TITAN Xp Off | 00000000:68:00.0 On | N/A | | 27% 45C P8 12W / 250W | 574MiB / 12194MiB | 0% Default | +-------------------------------+----------------------+----------------------+

It seems that an OOM occurs when the batch size is increased.

rolczynski commented 4 years ago

Hey @HunbeomBak

Alala... we have to fix it because the multi_gpu_model function is deprecated on 1 April, more details here.

As you can see in this link, we have to use the MirrorStrategy in our case. If you want to help, please make the Pull Request. Otherwise, the hotfix is to create the model within the context of the strategy. Please read the detailed tutorial directly on the TensorFlow page here

Please tell me if you can fix it.

djo-koconi commented 4 years ago

Hi @rolczynski Seems that the usage of target_tensors option when compiling the model is not supported in tf MirrorStrategy.

Do you have any idea how can usage of target_tensors option can be avoided and model mapped to the right target automatically?

In my experiments, just disabling target_tensors option throws a ctc loss related error when compiling the model. This is the snippet I am mentioning:

def compile_model(self): """ The compiled model means the model configured for training. """ y = keras.layers.Input(name='y', shape=[None], dtype='int32') loss = self.get_loss() self._model.compile(self._optimizer, loss, target_tensors=[y]) logger.info("Model is successfully compiled")

rolczynski commented 4 years ago

It's quite a big change. As we can see, things tend to be more and more complicated when we try to stick with the "functional API". We wish to build models effortlessly so I think we should do it by subclassing the tf.keras.Model class.