tensorflow / benchmarks

A benchmark framework for Tensorflow
Apache License 2.0
1.14k stars 632 forks source link

Alternative/current state of tf_cnn_benchmark #524

Open kessel opened 2 years ago

kessel commented 2 years ago

Hello community and devs,

a quick question from my side. I see that tf_cnn_benchmark is no longer actively maintained. I see that this makes sense to reduce the code volume that requires compatibility with future tf versions. But I would like to understand if this poses a severe issue in using the benchmark in the upcoming time. Is the code known to be incompatible or not achieve the expected performance when using for instance tf2.8?

In other words: Is this tf_cnn_benchmark still in good use and only the promise to continue developing and maintaining the code missing? Or is it already outdated?

And the documentation points towards to the new TF2 models for benchmarking. Are you aware of an implementation of an actual benchmark based on the models that could be an alternative?

Would be happy to get a reply. Cheers Stefan

reedwm commented 2 years ago

The benchmark is now unmaintained and untested. I do not recommend using it anymore. I think it still is functionally correct and I doubt it will perform worse than it previously did (but it's very possible I'm wrong). However, I recommend using the official models, as you pointed out.

And the documentation points towards to the new TF2 models for benchmarking. Are you aware of an implementation of an actual benchmark based on the models that could be an alternative?

Running the official models prints out the performance numbers, so it can be used as a benchmark. For example, you can run the official resnet50 model from source by following the instructions here with Method 2, navigating to the path <official models repo>/official/vision, then running

python train.py --logtostderr --model_dir=/tmp/model_dir --experiment=resnet_imagenet --mode=train --params_override=runtime.num_gpus=1,task.train_data.global_batch_size=64,task.train_data.input_path=<path-to-imagenet-tfrecords>/train*,task.validation_data.input_path=<path-to-imagenet-tfrecords>/valid* --config_file configs/experiments/image_classification/imagenet_resnet50_gpu.yaml

In the command above, you need to replace both instances of <path-to-imagenet-tfrecords> with the path to Imagenet in tfrecords format. Unlike tf_cnn_benchmarks, synthetic data is not supported.

tfboyd commented 2 years ago

I miss you @reedwm

reedwm commented 2 years ago

I miss you too @tfboyd! (this comment is unrelated to this issue BTW @kessel)