tensorflow / models

Models and examples built with TensorFlow
Other
77.04k stars 45.77k forks source link

Config difference GPU/TPU #9230

Open turowicz opened 4 years ago

turowicz commented 4 years ago

What is the difference between GPU and TPU configs in the Model Zoo?

https://github.com/tensorflow/models/tree/master/research/object_detection/configs/tf2

I'm worried that all my trainings are going wrong because I'm trying to run locally on my GPU, but the published configs can only be used with a TPU.

mgon5170 commented 4 years ago

I haven't gotten my code to fully run so I'm not 100% certain but I think it handles it outside of the config file. I haven't made any changes to the config file outside of the directory paths and this was the first output I got:

SSD) USER$ python object_detection/model_main_tf2.py --pipeline_config_path=${PIPELINE_CONFIG_PATH} --model_dir=${MODEL_DIR} --alsologtostder 2020-09-10 09:38:48.794458: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2020-09-10 09:38:48.806430: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fb7c6883500 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-09-10 09:38:48.806468: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version WARNING:tensorflow:There are non-GPU devices in tf.distribute.Strategy, not using nccl allreduce. W0910 09:38:48.807556 4723258816 cross_device_ops.py:1175] There are non-GPU devices in tf.distribute.Strategy, not using nccl allreduce. INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',) I0910 09:38:48.807872 4723258816 mirrored_strategy.py:500] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',)

as far as I can tell the code determines what it should run on. Granted I'm not running on GPU since I'm on a Mac. Hope this helps!

turowicz commented 4 years ago

I get that, but my question was related to individual config values that may have been chosen for TPU use. Otherwise why put TPU in the file name?

mgon5170 commented 4 years ago

I assume it's what they used to generate the model/checkpoints.

turowicz commented 4 years ago

Let me rephrase the question:

"Is it a good idea to take a TPU config and run further training it on GPU, or do we need to change something in that file for better results?