Open asaiacai opened 5 months ago
Can you share some of the errors that you are seeing?
@asaiacai - could you share the error output?
this was my output
$ pytest -s tests/unit/ops/deepspeed4science/test_DS4Sci_EvoformerAttention.py
============================= test session starts ==============================
platform linux -- Python 3.11.7, pytest-8.0.0, pluggy-1.4.0 -- /usr/local/bin/python3
cachedir: .pytest_cache
rootdir: /home/paperspace/DeepSpeed/tests
configfile: pytest.ini
plugins: anyio-4.0.0
collecting ... [2024-02-14 06:30:47,448] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
collected 4 items
tests/unit/ops/deepspeed4science/test_DS4Sci_EvoformerAttention.py::test_DS4Sci_EvoformerAttention[tensor_shape0-dtype0] Using /home/paperspace/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
Creating extension directory /home/paperspace/.cache/torch_extensions/py311_cu121/evoformer_attn...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/paperspace/.cache/torch_extensions/py311_cu121/evoformer_attn/build.ninja...
Building extension module evoformer_attn...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/4] c++ -MMD -MF attention.o.d -DTORCH_EXTENSION_NAME=evoformer_attn -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/paperspace/cutlass/include -I/home/paperspace/cutlass/tools/util/include -isystem /home/paperspace/.local/lib/python3.11/site-packages/torch/include -isystem /home/paperspace/.local/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /home/paperspace/.local/lib/python3.11/site-packages/torch/include/TH -isystem /home/paperspace/.local/lib/python3.11/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DBF16_AVAILABLE -c /home/paperspace/.local/lib/python3.11/site-packages/deepspeed/ops/csrc/deepspeed4science/evoformer_attn/attention.cpp -o attention.o
[2/4] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=evoformer_attn -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/paperspace/cutlass/include -I/home/paperspace/cutlass/tools/util/include -isystem /home/paperspace/.local/lib/python3.11/site-packages/torch/include -isystem /home/paperspace/.local/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /home/paperspace/.local/lib/python3.11/site-packages/torch/include/TH -isystem /home/paperspace/.local/lib/python3.11/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -DGPU_ARCH=90 -DBF16_AVAILABLE -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -c /home/paperspace/.local/lib/python3.11/site-packages/deepspeed/ops/csrc/deepspeed4science/evoformer_attn/attention_cu.cu -o attention_cu.cuda.o
[3/4] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=evoformer_attn -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/paperspace/cutlass/include -I/home/paperspace/cutlass/tools/util/include -isystem /home/paperspace/.local/lib/python3.11/site-packages/torch/include -isystem /home/paperspace/.local/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /home/paperspace/.local/lib/python3.11/site-packages/torch/include/TH -isystem /home/paperspace/.local/lib/python3.11/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_90,code=sm_90 -gencode=arch=compute_90,code=compute_90 -DGPU_ARCH=90 -DBF16_AVAILABLE -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -c /home/paperspace/.local/lib/python3.11/site-packages/deepspeed/ops/csrc/deepspeed4science/evoformer_attn/attention_back.cu -o attention_back.cuda.o
[4/4] c++ attention.o attention_back.cuda.o attention_cu.cuda.o -shared -lcurand -L/home/paperspace/.local/lib/python3.11/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/usr/local/cuda/lib64 -lcudart -o evoformer_attn.so
Loading extension module evoformer_attn...
Time to load evoformer_attn op: 308.550683259964 seconds
PASSED
tests/unit/ops/deepspeed4science/test_DS4Sci_EvoformerAttention.py::test_DS4Sci_EvoformerAttention[tensor_shape0-dtype1] PASSED
tests/unit/ops/deepspeed4science/test_DS4Sci_EvoformerAttention.py::test_DS4Sci_EvoformerAttention[tensor_shape1-dtype0] PASSED
tests/unit/ops/deepspeed4science/test_DS4Sci_EvoformerAttention.py::test_DS4Sci_EvoformerAttention[tensor_shape1-dtype1] FAILED
=================================== FAILURES ===================================
_____________ test_DS4Sci_EvoformerAttention[tensor_shape1-dtype1] _____________
dtype = torch.bfloat16, tensor_shape = (1, 512, 256, 8, 8)
@pytest.mark.parametrize("dtype", [torch.float16, torch.bfloat16])
@pytest.mark.parametrize("tensor_shape", [(1, 256, 256, 4, 32), (1, 512, 256, 8, 8)])
def test_DS4Sci_EvoformerAttention(dtype, tensor_shape):
skip_on_arch(8 if dtype == torch.bfloat16 else 7)
batch, n, seq_len, heads, dim = tensor_shape
Q = torch.randn(batch,
n,
seq_len,
heads,
dim,
dtype=dtype,
device=get_accelerator().device_name(),
requires_grad=True)
K = torch.randn(batch,
n,
seq_len,
heads,
dim,
dtype=dtype,
device=get_accelerator().device_name(),
requires_grad=True)
V = torch.randn(batch,
n,
seq_len,
heads,
dim,
dtype=dtype,
device=get_accelerator().device_name(),
requires_grad=True)
mask = torch.randint(0, 2, (batch, n, 1, 1, seq_len), dtype=dtype, device=get_accelerator().device_name())
mask_bias = 1e9 * (mask - 1)
bias = torch.randn(batch,
1,
heads,
seq_len,
seq_len,
dtype=dtype,
device=get_accelerator().device_name(),
requires_grad=True)
dummy_out = torch.rand_like(Q, dtype=dtype, device=get_accelerator().device_name())
ref_out = attention_reference(Q, K, V, [mask_bias, bias], 1 / (dim**0.5))
ref_out.backward(dummy_out)
ref_dv, V.grad = V.grad.clone(), None
ref_dk, K.grad = K.grad.clone(), None
ref_dq, Q.grad = Q.grad.clone(), None
ref_db, bias.grad = bias.grad.clone(), None
out = DS4Sci_EvoformerAttention(Q, K, V, [mask_bias, bias])
out.backward(dummy_out)
dv, v_grad = V.grad.clone(), None
dk, k_grad = K.grad.clone(), None
dq, q_grad = Q.grad.clone(), None
db, bias.grad = bias.grad.clone(), None
eps = 1e-2 if dtype == torch.float16 else 5e-2
assert torch.max(torch.abs(ref_out - out)).item() < eps, f"out eps: {torch.max(torch.abs(ref_out - out))}"
assert torch.max(torch.abs(ref_dv - dv)) < eps, f"dv eps: {torch.max(torch.abs(ref_dv - dv))}"
> assert torch.max(torch.abs(ref_dk - dk)) < eps, f"dk eps: {torch.max(torch.abs(ref_dk - dk))}"
E AssertionError: dk eps: 0.0625
E assert tensor(0.0625, device='cuda:0', dtype=torch.bfloat16) < 0.05
E + where tensor(0.0625, device='cuda:0', dtype=torch.bfloat16) = <built-in method max of type object at 0x7f450847aaa0>(tensor([[[[[0.0000e+00, 0.0000e+00, 1.0681e-04, ..., 0.0000e+00,\n 2.4414e-04, 1.0376e-03],\n [9.7656e-04, 0.0000e+00, 4.8828e-04, ..., 0.0000e+00,\n 0.0000e+00, 6.1798e-04],\n [4.8828e-04, 4.8828e-04, 3.0518e-04, ..., 8.5449e-04,\n 9.7656e-04, 9.7656e-04],\n ...,\n [0.0000e+00, 4.8828e-04, 6.1035e-04, ..., 0.0000e+00,\n 0.0000e+00, 7.9346e-04],\n [4.8828e-04, 9.7656e-04, 9.7656e-04, ..., 0.0000e+00,\n 0.0000e+00, 4.8828e-04],\n [0.0000e+00, 4.8828e-04, 0.0000e+00, ..., 9.7656e-04,\n 0.0000e+00, 1.4648e-03]],\n\n [[0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n [0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n [0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n ...,\n [0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n [0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n [0.0000e+00, 0.0000e... [0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n [0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n ...,\n [0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n [0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n [0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00]],\n\n [[3.6621e-04, 0.0000e+00, 2.4414e-04, ..., 2.4414e-04,\n 2.4414e-04, 0.0000e+00],\n [4.8828e-04, 7.3242e-04, 1.2207e-04, ..., 1.8311e-04,\n 0.0000e+00, 4.8828e-04],\n [0.0000e+00, 7.3242e-04, 4.8828e-04, ..., 4.8828e-04,\n 9.7656e-04, 2.2888e-04],\n ...,\n [0.0000e+00, 0.0000e+00, 1.9531e-03, ..., 0.0000e+00,\n 9.7656e-04, 0.0000e+00],\n [2.4414e-04, 1.2207e-04, 4.8828e-04, ..., 4.8828e-04,\n 0.0000e+00, 0.0000e+00],\n [1.9531e-03, 9.7656e-04, 0.0000e+00, ..., 4.8828e-04,\n 9.7656e-04, 0.0000e+00]]]]], device='cuda:0', dtype=torch.bfloat16))
E + where <built-in method max of type object at 0x7f450847aaa0> = torch.max
E + and tensor([[[[[0.0000e+00, 0.0000e+00, 1.0681e-04, ..., 0.0000e+00,\n 2.4414e-04, 1.0376e-03],\n [9.7656e-04, 0.0000e+00, 4.8828e-04, ..., 0.0000e+00,\n 0.0000e+00, 6.1798e-04],\n [4.8828e-04, 4.8828e-04, 3.0518e-04, ..., 8.5449e-04,\n 9.7656e-04, 9.7656e-04],\n ...,\n [0.0000e+00, 4.8828e-04, 6.1035e-04, ..., 0.0000e+00,\n 0.0000e+00, 7.9346e-04],\n [4.8828e-04, 9.7656e-04, 9.7656e-04, ..., 0.0000e+00,\n 0.0000e+00, 4.8828e-04],\n [0.0000e+00, 4.8828e-04, 0.0000e+00, ..., 9.7656e-04,\n 0.0000e+00, 1.4648e-03]],\n\n [[0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n [0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n [0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n ...,\n [0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n [0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n [0.0000e+00, 0.0000e... [0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n [0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n ...,\n [0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n [0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n [0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00]],\n\n [[3.6621e-04, 0.0000e+00, 2.4414e-04, ..., 2.4414e-04,\n 2.4414e-04, 0.0000e+00],\n [4.8828e-04, 7.3242e-04, 1.2207e-04, ..., 1.8311e-04,\n 0.0000e+00, 4.8828e-04],\n [0.0000e+00, 7.3242e-04, 4.8828e-04, ..., 4.8828e-04,\n 9.7656e-04, 2.2888e-04],\n ...,\n [0.0000e+00, 0.0000e+00, 1.9531e-03, ..., 0.0000e+00,\n 9.7656e-04, 0.0000e+00],\n [2.4414e-04, 1.2207e-04, 4.8828e-04, ..., 4.8828e-04,\n 0.0000e+00, 0.0000e+00],\n [1.9531e-03, 9.7656e-04, 0.0000e+00, ..., 4.8828e-04,\n 9.7656e-04, 0.0000e+00]]]]], device='cuda:0', dtype=torch.bfloat16) = <built-in method abs of type object at 0x7f450847aaa0>((tensor([[[[[-2.7344e-01, 1.5723e-01, -2.1667e-03, ..., -1.2305e-01,\n -5.5176e-02, 1.4954e-02],\n [ 1.5332e-01, 1.3086e-01, 6.7383e-02, ..., -1.0449e-01,\n -6.6895e-02, 7.6294e-04],\n [ 9.6191e-02, -4.5898e-02, -5.3406e-03, ..., 2.6733e-02,\n 2.3828e-01, 1.6211e-01],\n ...,\n [ 1.8359e-01, 9.4727e-02, -2.0508e-02, ..., 2.5635e-02,\n 1.5723e-01, -5.8899e-03],\n [-1.2207e-01, 1.5527e-01, -5.1025e-02, ..., 3.2617e-01,\n 3.4766e-01, -1.2305e-01],\n [ 3.0078e-01, 3.6865e-02, -3.4570e-01, ..., 2.2827e-02,\n -3.7500e-01, 1.9409e-02]],\n\n [[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n [ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n [ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n ...,\n [ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n [ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n ...0.0000e+00, 0.0000e+00],\n [ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n ...,\n [ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n [ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n [ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00]],\n\n [[-1.2756e-02, -8.0078e-02, -3.9551e-02, ..., -1.8677e-02,\n -3.2715e-02, -9.5703e-02],\n [-8.9355e-02, 1.9775e-02, -2.9663e-02, ..., 1.0010e-02,\n 1.3281e-01, 4.3213e-02],\n [ 2.0020e-01, 4.8340e-02, -5.3223e-02, ..., 8.3984e-02,\n 7.6172e-02, -3.1738e-03],\n ...,\n [ 5.3516e-01, 3.2812e-01, 3.7109e-01, ..., 8.9844e-01,\n 1.7871e-01, 6.0547e-01],\n [ 1.7334e-02, 2.6489e-02, 9.8145e-02, ..., 3.4668e-02,\n -3.2812e-01, 3.0273e-01],\n [-3.8281e-01, -2.1582e-01, -2.1094e-01, ..., -7.8613e-02,\n 2.1484e-01, 1.1621e-01]]]]], device='cuda:0',\n dtype=torch.bfloat16) - tensor([[[[[-2.7344e-01, 1.5723e-01, -2.0599e-03, ..., -1.2305e-01,\n -5.5420e-02, 1.3916e-02],\n [ 1.5234e-01, 1.3086e-01, 6.6895e-02, ..., -1.0449e-01,\n -6.6895e-02, 1.3809e-03],\n [ 9.5703e-02, -4.5410e-02, -5.6458e-03, ..., 2.5879e-02,\n 2.3926e-01, 1.6113e-01],\n ...,\n [ 1.8359e-01, 9.4238e-02, -2.1118e-02, ..., 2.5635e-02,\n 1.5723e-01, -6.6833e-03],\n [-1.2256e-01, 1.5430e-01, -5.0049e-02, ..., 3.2617e-01,\n 3.4766e-01, -1.2256e-01],\n [ 3.0078e-01, 3.6377e-02, -3.4570e-01, ..., 2.3804e-02,\n -3.7500e-01, 1.7944e-02]],\n\n [[ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n [ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n [ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n ...,\n [ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n [ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n ...0.0000e+00, 0.0000e+00],\n [ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n ...,\n [ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n [ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00],\n [ 0.0000e+00, 0.0000e+00, 0.0000e+00, ..., 0.0000e+00,\n 0.0000e+00, 0.0000e+00]],\n\n [[-1.3123e-02, -8.0078e-02, -3.9307e-02, ..., -1.8433e-02,\n -3.2471e-02, -9.5703e-02],\n [-8.9844e-02, 1.9043e-02, -2.9785e-02, ..., 9.8267e-03,\n 1.3281e-01, 4.3701e-02],\n [ 2.0020e-01, 4.9072e-02, -5.3711e-02, ..., 8.4473e-02,\n 7.7148e-02, -2.9449e-03],\n ...,\n [ 5.3516e-01, 3.2812e-01, 3.7305e-01, ..., 8.9844e-01,\n 1.7773e-01, 6.0547e-01],\n [ 1.7578e-02, 2.6611e-02, 9.7656e-02, ..., 3.5156e-02,\n -3.2812e-01, 3.0273e-01],\n [-3.8086e-01, -2.1680e-01, -2.1094e-01, ..., -7.9102e-02,\n 2.1387e-01, 1.1621e-01]]]]], device='cuda:0',\n dtype=torch.bfloat16)))
E + where <built-in method abs of type object at 0x7f450847aaa0> = torch.abs
tests/unit/ops/deepspeed4science/test_DS4Sci_EvoformerAttention.py:101: AssertionError
=============================== warnings summary ===============================
../../../usr/lib/python3/dist-packages/pkg_resources/_vendor/pyparsing.py:87
/usr/lib/python3/dist-packages/pkg_resources/_vendor/pyparsing.py:87: DeprecationWarning: module 'sre_constants' is deprecated
import sre_constants
../../../usr/lib/python3/dist-packages/pytz/__init__.py:31
/usr/lib/python3/dist-packages/pytz/__init__.py:31: DeprecationWarning: invalid escape sequence '\s'
match = re.match("^#\s*version\s*([0-9a-z]*)\s*$", line)
unit/ops/deepspeed4science/test_DS4Sci_EvoformerAttention.py::test_DS4Sci_EvoformerAttention[tensor_shape0-dtype0]
/home/paperspace/DeepSpeed/tests/conftest.py:47: UserWarning: Running test without verifying torch version, please provide an expected torch version with --torch_ver
warnings.warn(
unit/ops/deepspeed4science/test_DS4Sci_EvoformerAttention.py::test_DS4Sci_EvoformerAttention[tensor_shape0-dtype0]
/home/paperspace/DeepSpeed/tests/conftest.py:54: UserWarning: Running test without verifying cuda version, please provide an expected cuda version with --cuda_ver
warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================== slowest durations ===============================
309.09s call unit/ops/deepspeed4science/test_DS4Sci_EvoformerAttention.py::test_DS4Sci_EvoformerAttention[tensor_shape0-dtype0]
(11 durations < 1s hidden. Use -vv to show these durations.)
=========================== short test summary info ============================
FAILED tests/unit/ops/deepspeed4science/test_DS4Sci_EvoformerAttention.py::test_DS4Sci_EvoformerAttention[tensor_shape1-dtype1] - AssertionError: dk eps: 0.0625
============= 1 failed, 3 passed, 4 warnings in 312.62s (0:05:12) ==============
observing same issue on A100 (second half of channel components is incorrect) when cross-checking with pytorch's sdpa
correction:
Describe the bug EvoFormer attention kernel test case fails non deterministically on H100s.
To Reproduce Run,
pytest -s tests/unit/ops/deepspeed4science/test_DS4Sci_EvoformerAttention.py
Expected behavior This passed for me on A100s.
ds_report output
System info (please complete the following information):
Environment
Additional context The test cases pass on x4 A100-80GB