openai / blocksparse

Efficient GPU kernels for block-sparse matrix multiplication and convolution
https://blog.openai.com/block-sparse-gpu-kernels/
MIT License
1.03k stars 202 forks source link

test/blocksparse_conv_test.py failed and example/simple.py sometimes raised an invalid memory access error #53

Open xuyifangreeneyes opened 4 years ago

xuyifangreeneyes commented 4 years ago

System information

Encountered problem I tried both pip install blocksparse and building from source. After installation, I can run import blocksparse in Python and pass most tests. However, when I run test/blocksparse_conv_test.py, the following error occurred.

(tf13) ubuntu@xxx:~/blocksparse$ python test/blocksparse_conv_test.py
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
WARNING:tensorflow:From /home/ubuntu/anaconda3/lib/python3.7/contextlib.py:82: TensorFlowTestCase.test_session (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `self.session()` or `self.cached_session()` instead.
2020-07-19 15:22:55.214905: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-07-19 15:22:55.236910: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2499995000 Hz
2020-07-19 15:22:55.237482: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55b7bd771c50 executing computations on platform Host. Devices:
2020-07-19 15:22:55.237509: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2020-07-19 15:22:55.362344: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-19 15:22:55.363172: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:1e.0
totalMemory: 14.75GiB freeMemory: 14.65GiB
2020-07-19 15:22:55.363193: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-07-19 15:22:55.393925: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-19 15:22:55.393972: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
2020-07-19 15:22:55.393981: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
2020-07-19 15:22:55.394077: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14241 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:1e.0, compute capability: 7.5)
2020-07-19 15:22:55.395613: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55b7bbfe59e0 executing computations on platform CUDA. Devices:
2020-07-19 15:22:55.395639: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla T4, Compute Capability 7.5

test1
2020-07-19 15:22:55.429514: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at blocksparse_conv_op.cc:320 : Internal: device kernel image is invalid
ERROR:tensorflow:device kernel image is invalid
         [[node test1/F4B4/BlocksparseConv (defined at <string>:471) ]]
         [[node test1/F4B4/BlocksparseConv (defined at <string>:471) ]]

Caused by op 'test1/F4B4/BlocksparseConv', defined at:
  File "test/blocksparse_conv_test.py", line 213, in <module>
    tf.test.main()
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/platform/test.py", line 64, in main
    return _googletest.main(argv)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/platform/googletest.py", line 100, in main
    benchmark.benchmarks_main(true_main=main_wrapper)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/platform/benchmark.py", line 371, in benchmarks_main
    true_main()
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/platform/googletest.py", line 99, in main_wrapper
    return app.run(main=g_main, argv=args)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/platform/googletest.py", line 70, in g_main
    return unittest_main(argv=argv)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/main.py", line 101, in __init__
    self.runTests()
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/main.py", line 271, in runTests
    self.result = testRunner.run(self.test)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/runner.py", line 176, in run
    test(result)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/suite.py", line 84, in __call__
    return self.run(*args, **kwds)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/suite.py", line 122, in run
    test(result)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/suite.py", line 84, in __call__
    return self.run(*args, **kwds)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/suite.py", line 122, in run
    test(result)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/case.py", line 676, in __call__
    return self.run(*args, **kwds)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/case.py", line 628, in run
    testMethod()
  File "test/blocksparse_conv_test.py", line 126, in testBlocksparseConv
    op   = bs_conv_op(devF, devI)
  File "/home/ubuntu/blocksparse/blocksparse/conv.py", line 511, in __call__
    dimF=F.get_shape().as_list(), fshare=self.fshared, bshare=self.bshared, debug=self.debug
  File "<string>", line 471, in blocksparse_conv
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

InternalError (see above for traceback): device kernel image is invalid
         [[node test1/F4B4/BlocksparseConv (defined at <string>:471) ]]
         [[node test1/F4B4/BlocksparseConv (defined at <string>:471) ]]

Es
======================================================================
ERROR: testBlocksparseConv (__main__.BlocksparseConvTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InternalError: device kernel image is invalid
         [[{{node test1/F4B4/BlocksparseConv}}]]
         [[{{node test1/F4B4/BlocksparseConv}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test/blocksparse_conv_test.py", line 127, in testBlocksparseConv
    devO = sess.run( op )
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/test_util.py", line 1368, in run
    return super(ErrorLoggingSession, self).run(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: device kernel image is invalid
         [[node test1/F4B4/BlocksparseConv (defined at <string>:471) ]]
         [[node test1/F4B4/BlocksparseConv (defined at <string>:471) ]]

Caused by op 'test1/F4B4/BlocksparseConv', defined at:
  File "test/blocksparse_conv_test.py", line 213, in <module>
    tf.test.main()
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/platform/test.py", line 64, in main
    return _googletest.main(argv)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/platform/googletest.py", line 100, in main
    benchmark.benchmarks_main(true_main=main_wrapper)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/platform/benchmark.py", line 371, in benchmarks_main
    true_main()
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/platform/googletest.py", line 99, in main_wrapper
    return app.run(main=g_main, argv=args)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/platform/googletest.py", line 70, in g_main
    return unittest_main(argv=argv)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/main.py", line 101, in __init__
    self.runTests()
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/main.py", line 271, in runTests
    self.result = testRunner.run(self.test)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/runner.py", line 176, in run
    test(result)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/suite.py", line 84, in __call__
    return self.run(*args, **kwds)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/suite.py", line 122, in run
    test(result)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/suite.py", line 84, in __call__
    return self.run(*args, **kwds)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/suite.py", line 122, in run
    test(result)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/case.py", line 676, in __call__
    return self.run(*args, **kwds)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/case.py", line 628, in run
    testMethod()
  File "test/blocksparse_conv_test.py", line 126, in testBlocksparseConv
    op   = bs_conv_op(devF, devI)
  File "/home/ubuntu/blocksparse/blocksparse/conv.py", line 511, in __call__
    dimF=F.get_shape().as_list(), fshare=self.fshared, bshare=self.bshared, debug=self.debug
  File "<string>", line 471, in blocksparse_conv
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

InternalError (see above for traceback): device kernel image is invalid
         [[node test1/F4B4/BlocksparseConv (defined at <string>:471) ]]
         [[node test1/F4B4/BlocksparseConv (defined at <string>:471) ]]

----------------------------------------------------------------------
Ran 2 tests in 0.231s

FAILED (errors=1, skipped=1)

Besides, invalid memory access sometimes happens when running examples/simples.py. Here is the output without error.

(tf13) ubuntu@xxx:~/blocksparse$ python examples/simple.py
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
WARNING:tensorflow:From /home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2020-07-19 15:23:58.994318: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-07-19 15:23:59.016917: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2499995000 Hz
2020-07-19 15:23:59.017474: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x56341a1f6330 executing computations on platform Host. Devices:
2020-07-19 15:23:59.017505: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2020-07-19 15:23:59.122639: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-19 15:23:59.123458: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:1e.0
totalMemory: 14.75GiB freeMemory: 14.65GiB
2020-07-19 15:23:59.123478: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-07-19 15:23:59.152687: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-19 15:23:59.152724: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
2020-07-19 15:23:59.152735: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
2020-07-19 15:23:59.152835: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14241 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:1e.0, compute capability: 7.5)
2020-07-19 15:23:59.154217: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5634190aa650 executing computations on platform CUDA. Devices:
2020-07-19 15:23:59.154239: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla T4, Compute Capability 7.5
[array([[-0.00464108, -0.00446517, -0.00446705, ..., -0.00433037,
        -0.00435545, -0.00431154],
       [ 0.00696341,  0.00687434,  0.00675924, ...,  0.00679887,
         0.00693929,  0.00719775],
       [ 0.01524079,  0.01537668,  0.01533529, ...,  0.01533816,
         0.01512151,  0.01528387],
       ...,
       [-0.00238256, -0.00245797, -0.0022754 , ..., -0.00224203,
        -0.00239737, -0.00237827],
       [-0.00508011, -0.00536294, -0.00516913, ..., -0.00537378,
        -0.00533525, -0.00540836],
       [ 0.01230985,  0.01257054,  0.01233936, ...,  0.01226609,
         0.012429  ,  0.01214379]], dtype=float32)]

And here is the output when the error appears.

(tf13) ubuntu@xxx:~/blocksparse$ python examples/simple.py
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
WARNING:tensorflow:From /home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2020-07-19 15:24:31.054902: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-07-19 15:24:31.076918: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2499995000 Hz
2020-07-19 15:24:31.077469: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x56258d1e4480 executing computations on platform Host. Devices:
2020-07-19 15:24:31.077494: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2020-07-19 15:24:31.176438: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-19 15:24:31.177252: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:1e.0
totalMemory: 14.75GiB freeMemory: 14.65GiB
2020-07-19 15:24:31.177274: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-07-19 15:24:31.208119: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-19 15:24:31.208164: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
2020-07-19 15:24:31.208176: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
2020-07-19 15:24:31.208278: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14241 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:1e.0, compute capability: 7.5)
2020-07-19 15:24:31.209716: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x56258c098570 executing computations on platform CUDA. Devices:
2020-07-19 15:24:31.209739: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2020-07-19 15:24:31.685492: E tensorflow/stream_executor/cuda/cuda_event.cc:48] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2020-07-19 15:24:31.685539: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:274] Unexpected Event status: 1
Aborted (core dumped)

I guess that those problems are due to my TensorFlow and CUDA version. Could anyone help me? Thanks a lot!

ujay-zheng commented 2 years ago

System information

  • OS Platform and Distribution: Linux Ubuntu 18.04
  • TensorFlow version: 1.13.1 (with GPU support)
  • Python version: 3.7.7
  • CUDA/cuDNN version: 10.0 / 7
  • GPU: Tesla T4

Encountered problem I tried both pip install blocksparse and building from source. After installation, I can run import blocksparse in Python and pass most tests. However, when I run test/blocksparse_conv_test.py, the following error occurred.

(tf13) ubuntu@xxx:~/blocksparse$ python test/blocksparse_conv_test.py
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
WARNING:tensorflow:From /home/ubuntu/anaconda3/lib/python3.7/contextlib.py:82: TensorFlowTestCase.test_session (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `self.session()` or `self.cached_session()` instead.
2020-07-19 15:22:55.214905: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-07-19 15:22:55.236910: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2499995000 Hz
2020-07-19 15:22:55.237482: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55b7bd771c50 executing computations on platform Host. Devices:
2020-07-19 15:22:55.237509: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2020-07-19 15:22:55.362344: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-19 15:22:55.363172: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:1e.0
totalMemory: 14.75GiB freeMemory: 14.65GiB
2020-07-19 15:22:55.363193: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-07-19 15:22:55.393925: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-19 15:22:55.393972: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
2020-07-19 15:22:55.393981: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
2020-07-19 15:22:55.394077: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14241 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:1e.0, compute capability: 7.5)
2020-07-19 15:22:55.395613: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55b7bbfe59e0 executing computations on platform CUDA. Devices:
2020-07-19 15:22:55.395639: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla T4, Compute Capability 7.5

test1
2020-07-19 15:22:55.429514: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at blocksparse_conv_op.cc:320 : Internal: device kernel image is invalid
ERROR:tensorflow:device kernel image is invalid
         [[node test1/F4B4/BlocksparseConv (defined at <string>:471) ]]
         [[node test1/F4B4/BlocksparseConv (defined at <string>:471) ]]

Caused by op 'test1/F4B4/BlocksparseConv', defined at:
  File "test/blocksparse_conv_test.py", line 213, in <module>
    tf.test.main()
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/platform/test.py", line 64, in main
    return _googletest.main(argv)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/platform/googletest.py", line 100, in main
    benchmark.benchmarks_main(true_main=main_wrapper)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/platform/benchmark.py", line 371, in benchmarks_main
    true_main()
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/platform/googletest.py", line 99, in main_wrapper
    return app.run(main=g_main, argv=args)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/platform/googletest.py", line 70, in g_main
    return unittest_main(argv=argv)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/main.py", line 101, in __init__
    self.runTests()
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/main.py", line 271, in runTests
    self.result = testRunner.run(self.test)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/runner.py", line 176, in run
    test(result)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/suite.py", line 84, in __call__
    return self.run(*args, **kwds)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/suite.py", line 122, in run
    test(result)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/suite.py", line 84, in __call__
    return self.run(*args, **kwds)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/suite.py", line 122, in run
    test(result)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/case.py", line 676, in __call__
    return self.run(*args, **kwds)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/case.py", line 628, in run
    testMethod()
  File "test/blocksparse_conv_test.py", line 126, in testBlocksparseConv
    op   = bs_conv_op(devF, devI)
  File "/home/ubuntu/blocksparse/blocksparse/conv.py", line 511, in __call__
    dimF=F.get_shape().as_list(), fshare=self.fshared, bshare=self.bshared, debug=self.debug
  File "<string>", line 471, in blocksparse_conv
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

InternalError (see above for traceback): device kernel image is invalid
         [[node test1/F4B4/BlocksparseConv (defined at <string>:471) ]]
         [[node test1/F4B4/BlocksparseConv (defined at <string>:471) ]]

Es
======================================================================
ERROR: testBlocksparseConv (__main__.BlocksparseConvTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InternalError: device kernel image is invalid
         [[{{node test1/F4B4/BlocksparseConv}}]]
         [[{{node test1/F4B4/BlocksparseConv}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test/blocksparse_conv_test.py", line 127, in testBlocksparseConv
    devO = sess.run( op )
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/test_util.py", line 1368, in run
    return super(ErrorLoggingSession, self).run(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: device kernel image is invalid
         [[node test1/F4B4/BlocksparseConv (defined at <string>:471) ]]
         [[node test1/F4B4/BlocksparseConv (defined at <string>:471) ]]

Caused by op 'test1/F4B4/BlocksparseConv', defined at:
  File "test/blocksparse_conv_test.py", line 213, in <module>
    tf.test.main()
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/platform/test.py", line 64, in main
    return _googletest.main(argv)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/platform/googletest.py", line 100, in main
    benchmark.benchmarks_main(true_main=main_wrapper)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/platform/benchmark.py", line 371, in benchmarks_main
    true_main()
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/platform/googletest.py", line 99, in main_wrapper
    return app.run(main=g_main, argv=args)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/platform/googletest.py", line 70, in g_main
    return unittest_main(argv=argv)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/main.py", line 101, in __init__
    self.runTests()
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/main.py", line 271, in runTests
    self.result = testRunner.run(self.test)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/runner.py", line 176, in run
    test(result)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/suite.py", line 84, in __call__
    return self.run(*args, **kwds)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/suite.py", line 122, in run
    test(result)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/suite.py", line 84, in __call__
    return self.run(*args, **kwds)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/suite.py", line 122, in run
    test(result)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/case.py", line 676, in __call__
    return self.run(*args, **kwds)
  File "/home/ubuntu/anaconda3/lib/python3.7/unittest/case.py", line 628, in run
    testMethod()
  File "test/blocksparse_conv_test.py", line 126, in testBlocksparseConv
    op   = bs_conv_op(devF, devI)
  File "/home/ubuntu/blocksparse/blocksparse/conv.py", line 511, in __call__
    dimF=F.get_shape().as_list(), fshare=self.fshared, bshare=self.bshared, debug=self.debug
  File "<string>", line 471, in blocksparse_conv
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

InternalError (see above for traceback): device kernel image is invalid
         [[node test1/F4B4/BlocksparseConv (defined at <string>:471) ]]
         [[node test1/F4B4/BlocksparseConv (defined at <string>:471) ]]

----------------------------------------------------------------------
Ran 2 tests in 0.231s

FAILED (errors=1, skipped=1)

Besides, invalid memory access sometimes happens when running examples/simples.py. Here is the output without error.

(tf13) ubuntu@xxx:~/blocksparse$ python examples/simple.py
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
WARNING:tensorflow:From /home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2020-07-19 15:23:58.994318: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-07-19 15:23:59.016917: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2499995000 Hz
2020-07-19 15:23:59.017474: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x56341a1f6330 executing computations on platform Host. Devices:
2020-07-19 15:23:59.017505: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2020-07-19 15:23:59.122639: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-19 15:23:59.123458: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:1e.0
totalMemory: 14.75GiB freeMemory: 14.65GiB
2020-07-19 15:23:59.123478: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-07-19 15:23:59.152687: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-19 15:23:59.152724: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
2020-07-19 15:23:59.152735: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
2020-07-19 15:23:59.152835: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14241 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:1e.0, compute capability: 7.5)
2020-07-19 15:23:59.154217: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5634190aa650 executing computations on platform CUDA. Devices:
2020-07-19 15:23:59.154239: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla T4, Compute Capability 7.5
[array([[-0.00464108, -0.00446517, -0.00446705, ..., -0.00433037,
        -0.00435545, -0.00431154],
       [ 0.00696341,  0.00687434,  0.00675924, ...,  0.00679887,
         0.00693929,  0.00719775],
       [ 0.01524079,  0.01537668,  0.01533529, ...,  0.01533816,
         0.01512151,  0.01528387],
       ...,
       [-0.00238256, -0.00245797, -0.0022754 , ..., -0.00224203,
        -0.00239737, -0.00237827],
       [-0.00508011, -0.00536294, -0.00516913, ..., -0.00537378,
        -0.00533525, -0.00540836],
       [ 0.01230985,  0.01257054,  0.01233936, ...,  0.01226609,
         0.012429  ,  0.01214379]], dtype=float32)]

And here is the output when the error appears.

(tf13) ubuntu@xxx:~/blocksparse$ python examples/simple.py
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
WARNING:tensorflow:From /home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2020-07-19 15:24:31.054902: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-07-19 15:24:31.076918: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2499995000 Hz
2020-07-19 15:24:31.077469: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x56258d1e4480 executing computations on platform Host. Devices:
2020-07-19 15:24:31.077494: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2020-07-19 15:24:31.176438: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-19 15:24:31.177252: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:1e.0
totalMemory: 14.75GiB freeMemory: 14.65GiB
2020-07-19 15:24:31.177274: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-07-19 15:24:31.208119: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-19 15:24:31.208164: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
2020-07-19 15:24:31.208176: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
2020-07-19 15:24:31.208278: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14241 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:1e.0, compute capability: 7.5)
2020-07-19 15:24:31.209716: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x56258c098570 executing computations on platform CUDA. Devices:
2020-07-19 15:24:31.209739: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2020-07-19 15:24:31.685492: E tensorflow/stream_executor/cuda/cuda_event.cc:48] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2020-07-19 15:24:31.685539: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:274] Unexpected Event status: 1
Aborted (core dumped)

I guess that those problems are due to my TensorFlow and CUDA version. Could anyone help me? Thanks a lot!

@xuyifangreeneyes I successfully run the docker container through this issuseisssue, however when running simple.py after installing blocksparse I had the same problem.I just changed the hidden_size in the simple.py to 4096*2,it crashed.I wonder if you found a solution.