google / automl

Google Brain AutoML
Apache License 2.0
6.25k stars 1.45k forks source link

tf_random_seed doen't return same results while using same seed. #1134

Closed imerish closed 2 years ago

imerish commented 2 years ago

Hello, thank you for implementation. I have a problem, I got different results while using same random seed and same config file and training command. My config: num_classes: 391 max_instances_per_image: 300 aspect_ratios: [1.0, 2.0, 0.5] input_rand_hflip: false map_freq: 1 grad_checkpoint: true

nms_configs: max_output_size: 300

label_map: {classes here}

My training command: python /mnt/automl/efficientdet/tf2/train.py \ --train_file_pattern=/home/madovbny/WSUM-5553/tfrecord/train/.tfrecord \ --val_file_pattern=/home/madovbny/WSUM-5553/tfrecord/test/.tfrecord \ --model_name=efficientdet-d0 \ --model_dir=/home/madovbny/WSUM-5553/model_no_aug_random_seed_123456_2 \ --pretrained_ckpt=/home/madovbny/WSUM-5553/checkpoint/efficientdet-d0 \ --batch_size=1 \ --eval_samples=88 \ --num_examples_per_epoch=1260 \ --num_epochs=200 \ --debug=True \ --tf_random_seed=123456 \ --hparams=/home/madovbny/WSUM-5553/config/config_no_aug.yaml

imerish commented 2 years ago

i download repo after fix and it's still doesn't working. I train on Quadro RTX 5000 gpu and resuls still being different. Maybe I should add PYTHONHASHSEED, numpy etc. seed? https://stackoverflow.com/questions/36288235/how-to-get-stable-results-with-tensorflow-setting-random-seed

imerish commented 2 years ago

Excuse me, is this commit https://github.com/google/automl/actions/runs/1859223396 is related to this issue?

fsx950223 commented 2 years ago

Yes.

imerish commented 2 years ago

thank you!

imerish commented 2 years ago

Goog day, I tried to train my similar models again using updated repo and still getting different results. image image

imerish commented 2 years ago

Hello, i tried again after your commit and I am sorry to say it, but it still doesn't working.

fsx950223 commented 2 years ago

I have tested it on colab, it works with CPU but not GPU.

imerish commented 2 years ago

Thank you for your answer. Are there any fixes for gpu training planned?

nss-ysasaki commented 2 years ago

@imerish I doubt that it is possible, due to inherent limitation of GPU computation. Details are explained here: https://keras.io/getting_started/faq/#how-can-i-obtain-reproducible-results-using-keras-during-development