mlcommons / inference

Reference implementations of MLPerf™ inference benchmarks
https://mlcommons.org/en/groups/inference
Apache License 2.0
1.22k stars 532 forks source link

BERT on TensorFlow fails #1276

Open coppock opened 1 year ago

coppock commented 1 year ago

On r2.1, the Docker container run fails as shown:

(mlperf) $ python3 run.py --backend=tf --scenario=Offline
.
.
.
Running LoadGen test...
2022-11-02 14:19:55.493183: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10
2022-11-02 14:30:24.704249: E tensorflow/stream_executor/cuda/cuda_blas.cc:440] failed to run cuBLAS routine: CUBLAS_STATUS_NOT_SUPPORTED
2022-11-02 14:30:24.704299: E tensorflow/stream_executor/cuda/cuda_blas.cc:2453] Internal: failed BLAS call, see log for details                               
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)                                                            
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
    run_metadata)                       
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Blas xGEMMBatched launch failed : a.shape=[16,384,64], b.shape=[16,384,64], m=384, n=384, k=64, batch_size=16 
         [[{{node bert/encoder/layer_0/attention/self/MatMul}}]]
  (1) Internal: Blas xGEMMBatched launch failed : a.shape=[16,384,64], b.shape=[16,384,64], m=384, n=384, k=64, batch_size=16                
         [[{{node bert/encoder/layer_0/attention/self/MatMul}}]]
         [[logits/_11]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run.py", line 120, in <module>
    main()
  File "run.py", line 102, in main
    lg.StartTestWithLogSettings(sut.sut, sut.qsl.qsl, settings, log_settings)
  File "/workspace/tf_SUT.py", line 64, in issue_queries
    result = self.sess.run(["logits:0"], feed_dict=feeds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message) 
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Blas xGEMMBatched launch failed : a.shape=[16,384,64], b.shape=[16,384,64], m=384, n=384, k=64, batch_size=16
         [[node bert/encoder/layer_0/attention/self/MatMul (defined at /workspace/tf_SUT.py:45) ]]
  (1) Internal: Blas xGEMMBatched launch failed : a.shape=[16,384,64], b.shape=[16,384,64], m=384, n=384, k=64, batch_size=16
         [[node bert/encoder/layer_0/attention/self/MatMul (defined at /workspace/tf_SUT.py:45) ]]
         [[logits/_11]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'bert/encoder/layer_0/attention/self/MatMul':
  File "run.py", line 120, in <module>
    main()
  File "run.py", line 68, in main
    sut = get_tf_sut(args)
  File "/workspace/tf_SUT.py", line 79, in get_tf_sut
    return BERT_TF_SUT(args)
  File "/workspace/tf_SUT.py", line 45, in __init__
    tf.import_graph_def(graph_def, name='')
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/importer.py", line 443, in import_graph_def
    _ProcessNewOps(graph)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/importer.py", line 236, in _ProcessNewOps
    for new_op in graph._add_new_tf_operations(compute_devices=False):  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3751, in _add_new_tf_operations 
    for c_op in c_api_util.new_tf_operations(self)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3751, in <listcomp>
    for c_op in c_api_util.new_tf_operations(self)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3641, in _create_op_from_tf_operation
    ret = Operation(c_op, self)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()

Segmentation fault (core dumped)

Looking into this, I suspected an out of memory condition on my GPU, but I'm using an NVIDIA A30 with 24GB of memory. I would think that's plenty enough. In case it's helpful, I'm running on Ubuntu 20.04 with NVIDIA driver version 520.61.05 and CUDA version 11.8.

nv-ananjappa commented 1 year ago

@arjunsuresh Any idea?

arjunsuresh commented 1 year ago

I'm not using the docker container - running the reference implementation using bert TF model on Ubuntu 22.04, python 3.10 and tensorflow 2.10 I'm getting the below error.

Use tf.gfile.GFile.
Traceback (most recent call last):
  File "/home/ubuntu/CM/repos/local/cache/e3b48cdce19e4f4a/inference/language/bert/run.py", line 120, in <module>
    main()
  File "/home/ubuntu/CM/repos/local/cache/e3b48cdce19e4f4a/inference/language/bert/run.py", line 68, in main
    sut = get_tf_sut(args)
  File "/home/ubuntu/CM/repos/local/cache/e3b48cdce19e4f4a/inference/language/bert/tf_SUT.py", line 79, in get_tf_sut
    return BERT_TF_SUT(args)
  File "/home/ubuntu/CM/repos/local/cache/e3b48cdce19e4f4a/inference/language/bert/tf_SUT.py", line 43, in __init__
    graph_def.ParseFromString(f.read())
google.protobuf.message.DecodeError: Error parsing message with type 'tensorflow.GraphDef'
Finished destroying SUT.
coppock commented 1 year ago

Ubuntu 20.04 with Python 3.8, version 2.10 tensorflow, and version 3.19 google.protobuf works fine for running BERT. As @arjunsuresh, I'm running without Docker. I'll draft a Dockerfile.

arjunsuresh commented 1 year ago

Thank you @coppock . protobuf is the culprit. Using protobuf version 3.19 I'm able to run on Ubuntu 22.04, python 3.10 and tensorflow 2.10. May be we should recreate this file using newer protobuf.

rnaidu02 commented 1 year ago

@pgmpablo157321 to look at this issue