How to adapt TPU configs for training on GPU

I am having problems when trying to train some of the models in the Tensorflow 2 Object Detection Model Zoo locally with 2 GPUs (2x Titan RTX with 24 GB of memory each). My goal is to apply transfer learning to train on my own custom dataset, starting from the checkpoints of the pre-trained models in the zoo. My datasets are large enough (a few thousand images), so I want to train the whole network using the script model_main_tf2.py, rather than just using the few shot fine tuning example.

I could train properly using one of the GPU config files (faster_rcnn_resnet50_v1_800x1333_coco17_gpu-8). However, most of the config files correspond to models training on TPUs, and I cannot manage to train the model with them. I am aware that I have to change the config files to reduce the batch size, as my GPUs do not have enough memory. I also understand that if I reduce the batch size, I have to reduce the learning rate. However, they still do not train correctly. Do I need to add some other changes to the TPU config files? Maybe something regarding sync_replicas or batch normalization?

It seems I am not the first one asking these questions, see these older issues: https://github.com/tensorflow/models/issues/9230 https://github.com/tensorflow/models/issues/8917#issuecomment-661268162

I think it would be great if you could clarify this a bit on the documentation.

tensorflow / models

How to adapt TPU configs for training on GPU #9338