tensorflow / benchmarks

A benchmark framework for Tensorflow
Apache License 2.0
1.14k stars 630 forks source link

Use time.perf_counter() instead of time.time() to compute durations #514

Closed PatriceVignola closed 3 years ago

PatriceVignola commented 3 years ago

To compute durations, time.perf_counter() is more reliable than time.time(). When time.time() is used on Windows in some of the faster benchmark models ran on the GPU (e.g. trivial), the lack in accuracy can result in divisions by zero:

E:\tfmodels\benchmarks\scripts\tf_cnn_benchmarks\benchmark_cnn.py:956: RuntimeWarning: invalid value encountered in subtract
  speed_jitter = 1.4826 * np.median(np.abs(speeds - np.median(speeds)))
1       images/sec: inf +/- nan (jitter = nan)  21.622
INFO:tensorflow:Error reported to Coordinator: <class 'ZeroDivisionError'>, float division by zero
I0707 15:12:44.122273   236 coordinator.py:224] Error reported to Coordinator: <class 'ZeroDivisionError'>, float division by zero
Traceback (most recent call last):
  File "E:\tfmodels\benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py", line 75, in <module>
    app.run(main)  # Raises error on invalid flags, unlike tf.app.run()
  File "C:\Users\pavignol\Miniconda3\envs\tfgpu\lib\site-packages\absl\app.py", line 312, in run
    _run_main(main, args)
  File "C:\Users\pavignol\Miniconda3\envs\tfgpu\lib\site-packages\absl\app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "E:\tfmodels\benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py", line 70, in main
    bench.run()
  File "E:\tfmodels\benchmarks\scripts\tf_cnn_benchmarks\benchmark_cnn.py", line 1883, in run
    return self._benchmark_train()
  File "E:\tfmodels\benchmarks\scripts\tf_cnn_benchmarks\benchmark_cnn.py", line 2088, in _benchmark_train
    return self._benchmark_graph(result_to_benchmark, eval_build_results)
  File "E:\tfmodels\benchmarks\scripts\tf_cnn_benchmarks\benchmark_cnn.py", line 2295, in _benchmark_graph
    is_chief, summary_writer, profiler)
  File "E:\tfmodels\benchmarks\scripts\tf_cnn_benchmarks\benchmark_cnn.py", line 2482, in benchmark_with_session
    elapsed_time)
ZeroDivisionError: float division by zero
reedwm commented 3 years ago

Please note, tf_cnn_benchmarks is unmantained and uses TF 1 APIs instead of TF 2 APIs. It is highly recommended to use the Official Models instead, which have cleaner implementations, support far more features and models, and use TF 2 APIs. I will merge this PR as a courtesy and because it is useful for users still using tf_cnn_benchmarks, but I encourage you to use the official models.

PatriceVignola commented 3 years ago

Please note, tf_cnn_benchmarks is unmantained and uses TF 1 APIs instead of TF 2 APIs. It is highly recommended to use the Official Models instead, which have cleaner implementations, support far more features and models, and use TF 2 APIs. I will merge this PR as a courtesy and because it is useful for users still using tf_cnn_benchmarks, but I encourage you to use the official models.

Thank you! We're using these models to test a TF 1.15 fork, which is why we can't use the official models.