k2-fsa / icefall

https://k2-fsa.github.io/icefall/
Apache License 2.0
912 stars 292 forks source link

RuntimeError: invalid shape dimension #486

Closed ahban closed 7 months ago

ahban commented 2 years ago

After installing k2, Lhotse, and icefall-related packages. Testing yesno shows me the following errors. I know this is an old problem as mentioned in #297, and it should have been fixed. However the problem still exists.

2022-07-21 11:04:46,595 INFO [train.py:483] Training started
2022-07-21 11:04:46,595 INFO [train.py:484] {'exp_dir': PosixPath('tdnn/exp'), 'lang_dir': PosixPath('data/lang_phone'), 'lr': 0.01, 'feature_dim': 23, 'weight_decay': 1e-06, 'start_epoch': 0, 'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 10, 'reset_interval': 20, 'valid_interval': 10, 'beam_size': 10, 'reduction': 'sum', 'use_double_scores': True, 'world_size': 1, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 15, 'seed': 42, 'feature_dir': PosixPath('data/fbank'), 'max_duration': 30.0, 'bucketing_sampler': False, 'num_buckets': 10, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2, 'env_info': {'k2-version': '1.17', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '9e88318adfd7c80290a96f8a888d279d45dc1564', 'k2-git-date': 'Mon Jul 18 16:26:06 2022', 'lhotse-version': '1.5.0.dev+git.7cce647.clean', 'torch-version': '1.8.1', 'torch-cuda-available': True, 'torch-cuda-version': '11.1', 'python-version': '3.8', 'icefall-git-branch': 'master', 'icefall-git-sha1': '3d2986b-clean', 'icefall-git-date': 'Wed Jul 20 21:32:53 2022', 'icefall-path': '/home/data/ddp-aban/devel/icefall', 'k2-path': '/home/data/ddp-aban/soft/anaconda3/envs/k2/lib/python3.8/site-packages/k2-1.17.dev20220720+cuda11.1.torch1.8.1-py3.8-linux-x86_64.egg/k2/__init__.py', 'lhotse-path': '/home/data/ddp-aban/soft/anaconda3/envs/k2/lib/python3.8/site-packages/lhotse/__init__.py', 'hostname': 'cri-asr-2u10g-32-23', 'IP address': '10.22.32.23'}}
2022-07-21 11:04:46,597 INFO [lexicon.py:176] Loading pre-compiled data/lang_phone/Linv.pt
2022-07-21 11:04:46,598 INFO [train.py:497] device: cuda:0
2022-07-21 11:04:49,529 INFO [asr_datamodule.py:146] About to get train cuts
2022-07-21 11:04:49,529 INFO [asr_datamodule.py:244] About to get train cuts
2022-07-21 11:04:49,530 INFO [asr_datamodule.py:149] About to create train dataset
2022-07-21 11:04:49,530 INFO [asr_datamodule.py:199] Using SingleCutSampler.
2022-07-21 11:04:49,530 INFO [asr_datamodule.py:205] About to create train dataloader
2022-07-21 11:04:49,531 INFO [asr_datamodule.py:218] About to get test cuts
2022-07-21 11:04:49,531 INFO [asr_datamodule.py:252] About to get test cuts
2022-07-21 11:04:50,724 INFO [train.py:422] Epoch 0, batch 0, loss[loss=1.069, over 2392.00 frames.], tot_loss[loss=1.069, over 2392.00 frames.], batch size: 4
Traceback (most recent call last):
  File "./tdnn/train.py", line 577, in <module>
    main()
  File "./tdnn/train.py", line 573, in main
    run(rank=0, world_size=1, args=args)
  File "./tdnn/train.py", line 538, in run
    train_one_epoch(
  File "./tdnn/train.py", line 406, in train_one_epoch
    loss, loss_info = compute_loss(
  File "./tdnn/train.py", line 302, in compute_loss
    decoding_graph = graph_compiler.compile(texts)
  File "/home/ddp-aban/devel/icefall/icefall/graph_compiler.py", line 78, in compile
    fsa_with_self_loops = k2.remove_epsilon_and_add_self_loops(
  File "/home/ddp-aban/soft/anaconda3/envs/k2/lib/python3.8/site-packages/k2-1.17.dev20220720+cuda11.1.torch1.8.1-py3.8-linux-x86_64.egg/k2/fsa_algo.py", line 647, in remove_epsilon_and_add_self_loops
    out_fsa = k2.utils.fsa_from_unary_function_ragged(
  File "/home/ddp-aban/soft/anaconda3/envs/k2/lib/python3.8/site-packages/k2-1.17.dev20220720+cuda11.1.torch1.8.1-py3.8-linux-x86_64.egg/k2/utils.py", line 512, in fsa_from_unary_function_ragged
    value[torch.where(
RuntimeError: invalid shape dimension -73

The version of k2

$ python -m k2.version 
Collecting environment information...

k2 version: 1.17
Build type: Release
Git SHA1: 9e88318adfd7c80290a96f8a888d279d45dc1564
Git date: Mon Jul 18 16:26:06 2022
Cuda used to build k2: 11.1
cuDNN used to build k2: 8.4.1
Python version used to build k2: 3.8
OS used to build k2: 
CMake version: 3.23.2
GCC version: 8.3.1
CMAKE_CUDA_FLAGS:  -Wno-deprecated-gpu-targets   -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w  --expt-extended-lambda -gencode arch=compute_86,code=sm_86 -D_GLIBCXX_USE_CXX11_ABI=0 --compiler-options -Wall  --compiler-options -Wno-strict-overflow  --compiler-options -Wno-unknown-pragmas 
CMAKE_CXX_FLAGS:  -D_GLIBCXX_USE_CXX11_ABI=0 -Wno-unused-variable  -Wno-strict-overflow 
PyTorch version used to build k2: 1.8.1
PyTorch is using Cuda: 11.1
NVTX enabled: True
With CUDA: True
Disable debug: True
Sync kernels : False
Disable checks: False
Max cpu memory allocate: 214748364800 bytes (or 200.0 GB)
k2 abort: False
__file__: /home/ddp-aban/soft/anaconda3/envs/k2/lib/python3.8/site-packages/k2-1.17.dev20220720+cuda11.1.torch1.8.1-py3.8-linux-x86_64.egg/k2/version/version.py
_k2.__file__: /home/ddp-aban/soft/anaconda3/envs/k2/lib/python3.8/site-packages/k2-1.17.dev20220720+cuda11.1.torch1.8.1-py3.8-linux-x86_64.egg/_k2.cpython-38-x86_64-linux-gnu.so

PS: I install k2 by compiling the source code.

the version of LHotse

$ python - < <(echo "import lhotse as l; print (l.__version__)")
1.5.0.dev+git.7cce647.clean

My os is centos 7.

csukuangfj commented 2 years ago

PS: I install k2 by compiling the source code.

Are you able to run the tests in https://github.com/k2-fsa/k2/tree/master/k2/python/tests ?

For instance, you can do

cd k2/python/tests
python3 ./remove_epsilon_self_loops_test.py
python3 ./remove_epsilon_test.py
ahban commented 2 years ago

The second script fails to run. and the output is below.

.F/opt/conda/conda-bld/pytorch_1616554793803/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [32,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1616554793803/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [96,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1616554793803/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [0,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1616554793803/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [1,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1616554793803/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [2,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1616554793803/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [64,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1616554793803/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [65,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1616554793803/work/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [66,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
[F] /home/ddp-aban/soft/k2/k2/csrc/array.h:385:T k2::Array1<T>::operator[](int32_t) const [with T = int; int32_t = int] Check failed: ret == cudaSuccess (710 vs. 0)  Error: device-side assert triggered. 

[ Stack-Trace: ]
/home/ddp-aban/soft/anaconda3/envs/k2/lib/python3.8/site-packages/k2-1.17.dev20220720+cuda11.1.torch1.8.1-py3.8-linux-x86_64.egg/k2/lib64/libk2_log.so(k2::internal::GetStackTrace()+0x34) [0x7f2aa1646c34]
/home/ddp-aban/soft/anaconda3/envs/k2/lib/python3.8/site-packages/k2-1.17.dev20220720+cuda11.1.torch1.8.1-py3.8-linux-x86_64.egg/k2/lib64/libk2context.so(k2::Array1<int>::operator[](int) const+0x842) [0x7f2aa1fa70e2]
/home/ddp-aban/soft/anaconda3/envs/k2/lib/python3.8/site-packages/k2-1.17.dev20220720+cuda11.1.torch1.8.1-py3.8-linux-x86_64.egg/k2/lib64/libk2context.so(k2::Renumbering::ComputeOld2New()+0x1c1) [0x7f2aa1fa13f1]
/home/ddp-aban/soft/anaconda3/envs/k2/lib/python3.8/site-packages/k2-1.17.dev20220720+cuda11.1.torch1.8.1-py3.8-linux-x86_64.egg/k2/lib64/libk2context.so(k2::Renumbering::ComputeNew2Old()+0x998) [0x7f2aa1fa2f88]
/home/ddp-aban/soft/anaconda3/envs/k2/lib/python3.8/site-packages/k2-1.17.dev20220720+cuda11.1.torch1.8.1-py3.8-linux-x86_64.egg/k2/lib64/libk2context.so(k2::SubsetRaggedShape(k2::RaggedShape&, k2::Renumbering&, int, k2::Array1<int>*)+0x330) [0x7f2aa2169f00]
/home/ddp-aban/soft/anaconda3/envs/k2/lib/python3.8/site-packages/k2-1.17.dev20220720+cuda11.1.torch1.8.1-py3.8-linux-x86_64.egg/_k2.cpython-38-x86_64-linux-gnu.so(+0x106971) [0x7f2aa3858971]
/home/ddp-aban/soft/anaconda3/envs/k2/lib/python3.8/site-packages/k2-1.17.dev20220720+cuda11.1.torch1.8.1-py3.8-linux-x86_64.egg/_k2.cpython-38-x86_64-linux-gnu.so(+0x106f56) [0x7f2aa3858f56]
/home/ddp-aban/soft/anaconda3/envs/k2/lib/python3.8/site-packages/k2-1.17.dev20220720+cuda11.1.torch1.8.1-py3.8-linux-x86_64.egg/_k2.cpython-38-x86_64-linux-gnu.so(+0x14d248) [0x7f2aa389f248]
/home/ddp-aban/soft/anaconda3/envs/k2/lib/python3.8/site-packages/k2-1.17.dev20220720+cuda11.1.torch1.8.1-py3.8-linux-x86_64.egg/_k2.cpython-38-x86_64-linux-gnu.so(+0x139728) [0x7f2aa388b728]
/home/ddp-aban/soft/anaconda3/envs/k2/lib/python3.8/site-packages/k2-1.17.dev20220720+cuda11.1.torch1.8.1-py3.8-linux-x86_64.egg/_k2.cpython-38-x86_64-linux-gnu.so(+0x32a34) [0x7f2aa3784a34]
python(+0x13c00e) [0x55ff8cc8500e]
python(_PyObject_MakeTpCall+0x3bf) [0x55ff8cc7a13f]
python(+0x166ca0) [0x55ff8ccafca0]
python(_PyEval_EvalFrameDefault+0x4f83) [0x55ff8cd24923]
python(_PyEval_EvalCodeWithName+0x260) [0x55ff8cd15600]
python(_PyFunction_Vectorcall+0x594) [0x55ff8cd16bc4]
python(_PyEval_EvalFrameDefault+0x1510) [0x55ff8cd20eb0]
python(_PyEval_EvalCodeWithName+0x260) [0x55ff8cd15600]
python(_PyFunction_Vectorcall+0x534) [0x55ff8cd16b64]
python(_PyEval_EvalFrameDefault+0x4f83) [0x55ff8cd24923]
python(_PyFunction_Vectorcall+0x1b7) [0x55ff8cd167e7]
python(+0x166b2e) [0x55ff8ccafb2e]
python(_PyEval_EvalFrameDefault+0x71b) [0x55ff8cd200bb]
python(_PyFunction_Vectorcall+0x1b7) [0x55ff8cd167e7]
python(_PyEval_EvalFrameDefault+0x4c0) [0x55ff8cd1fe60]
python(_PyEval_EvalCodeWithName+0x260) [0x55ff8cd15600]
python(_PyFunction_Vectorcall+0x534) [0x55ff8cd16b64]
python(+0x166bf8) [0x55ff8ccafbf8]
python(PyObject_Call+0x7d) [0x55ff8cc8020d]
python(_PyEval_EvalFrameDefault+0x1f07) [0x55ff8cd218a7]
python(_PyEval_EvalCodeWithName+0x260) [0x55ff8cd15600]
python(_PyFunction_Vectorcall+0x594) [0x55ff8cd16bc4]
python(_PyObject_FastCallDict+0x5f) [0x55ff8cca762f]
python(+0x194d2b) [0x55ff8ccddd2b]
python(_PyObject_MakeTpCall+0x3bf) [0x55ff8cc7a13f]
python(_PyEval_EvalFrameDefault+0x4eff) [0x55ff8cd2489f]
python(_PyEval_EvalCodeWithName+0x260) [0x55ff8cd15600]
python(_PyFunction_Vectorcall+0x534) [0x55ff8cd16b64]
python(+0x166bf8) [0x55ff8ccafbf8]
python(PyObject_Call+0x7d) [0x55ff8cc8020d]
python(_PyEval_EvalFrameDefault+0x1f07) [0x55ff8cd218a7]
python(_PyEval_EvalCodeWithName+0x260) [0x55ff8cd15600]
python(_PyFunction_Vectorcall+0x594) [0x55ff8cd16bc4]
python(_PyObject_FastCallDict+0x5f) [0x55ff8cca762f]
python(+0x194d2b) [0x55ff8ccddd2b]
python(_PyObject_MakeTpCall+0x3bf) [0x55ff8cc7a13f]
python(_PyEval_EvalFrameDefault+0x4eff) [0x55ff8cd2489f]
python(_PyEval_EvalCodeWithName+0x260) [0x55ff8cd15600]
python(_PyFunction_Vectorcall+0x534) [0x55ff8cd16b64]
python(+0x166bf8) [0x55ff8ccafbf8]

E[F] /home/ddp-aban/soft/k2/k2/csrc/pinned_context.cu:313:virtual void k2::PinnedContext::CopyDataTo(size_t, const void*, k2::ContextPtr, void*) Check failed: ret == cudaSuccess (710 vs. 0)  Error: device-side assert triggered. 

[ Stack-Trace: ]
/home/ddp-aban/soft/anaconda3/envs/k2/lib/python3.8/site-packages/k2-1.17.dev20220720+cuda11.1.torch1.8.1-py3.8-linux-x86_64.egg/k2/lib64/libk2_log.so(k2::internal::GetStackTrace()+0x34) [0x7f2aa1646c34]
/home/ddp-aban/soft/anaconda3/envs/k2/lib/python3.8/site-packages/k2-1.17.dev20220720+cuda11.1.torch1.8.1-py3.8-linux-x86_64.egg/k2/lib64/libk2context.so(k2::PinnedContext::CopyDataTo(unsigned long, void const*, std::shared_ptr<k2::Context>, void*)+0xe5c) [0x7f2aa213b10c]
/home/ddp-aban/soft/anaconda3/envs/k2/lib/python3.8/site-packages/k2-1.17.dev20220720+cuda11.1.torch1.8.1-py3.8-linux-x86_64.egg/k2/lib64/libk2context.so(k2::PytorchCpuContext::CopyDataTo(unsigned long, void const*, std::shared_ptr<k2::Context>, void*)+0x14d) [0x7f2aa22c908d]
/home/ddp-aban/soft/anaconda3/envs/k2/lib/python3.8/site-packages/k2-1.17.dev20220720+cuda11.1.torch1.8.1-py3.8-linux-x86_64.egg/k2/lib64/libk2context.so(k2::Array1<int>::CopyFrom(k2::Array1<int> const&)+0x8c) [0x7f2aa1fb2b1c]
/home/ddp-aban/soft/anaconda3/envs/k2/lib/python3.8/site-packages/k2-1.17.dev20220720+cuda11.1.torch1.8.1-py3.8-linux-x86_64.egg/k2/lib64/libk2context.so(k2::RaggedShape::To(std::shared_ptr<k2::Context>, bool) const+0x5bd) [0x7f2aa2141c5d]
/home/ddp-aban/soft/anaconda3/envs/k2/lib/python3.8/site-packages/k2-1.17.dev20220720+cuda11.1.torch1.8.1-py3.8-linux-x86_64.egg/_k2.cpython-38-x86_64-linux-gnu.so(+0xd6f0e) [0x7f2aa3828f0e]
/home/ddp-aban/soft/anaconda3/envs/k2/lib/python3.8/site-packages/k2-1.17.dev20220720+cuda11.1.torch1.8.1-py3.8-linux-x86_64.egg/_k2.cpython-38-x86_64-linux-gnu.so(+0xd74cb) [0x7f2aa38294cb]
/home/ddp-aban/soft/anaconda3/envs/k2/lib/python3.8/site-packages/k2-1.17.dev20220720+cuda11.1.torch1.8.1-py3.8-linux-x86_64.egg/_k2.cpython-38-x86_64-linux-gnu.so(+0xce1fb) [0x7f2aa38201fb]
/home/ddp-aban/soft/anaconda3/envs/k2/lib/python3.8/site-packages/k2-1.17.dev20220720+cuda11.1.torch1.8.1-py3.8-linux-x86_64.egg/_k2.cpython-38-x86_64-linux-gnu.so(+0x32a34) [0x7f2aa3784a34]
python(+0x13c00e) [0x55ff8cc8500e]
python(_PyObject_MakeTpCall+0x3bf) [0x55ff8cc7a13f]
python(+0x166ca0) [0x55ff8ccafca0]
python(_PyEval_EvalFrameDefault+0x4f83) [0x55ff8cd24923]
python(_PyFunction_Vectorcall+0x1b7) [0x55ff8cd167e7]
python(+0x166b2e) [0x55ff8ccafb2e]
python(_PyEval_EvalFrameDefault+0x4f83) [0x55ff8cd24923]
python(_PyFunction_Vectorcall+0x1b7) [0x55ff8cd167e7]
python(+0x166b2e) [0x55ff8ccafb2e]
python(_PyEval_EvalFrameDefault+0x71b) [0x55ff8cd200bb]
python(_PyFunction_Vectorcall+0x1b7) [0x55ff8cd167e7]
python(_PyEval_EvalFrameDefault+0x4c0) [0x55ff8cd1fe60]
python(_PyEval_EvalCodeWithName+0x260) [0x55ff8cd15600]
python(_PyFunction_Vectorcall+0x534) [0x55ff8cd16b64]
python(+0x166bf8) [0x55ff8ccafbf8]
python(PyObject_Call+0x7d) [0x55ff8cc8020d]
python(_PyEval_EvalFrameDefault+0x1f07) [0x55ff8cd218a7]
python(_PyEval_EvalCodeWithName+0x260) [0x55ff8cd15600]
python(_PyFunction_Vectorcall+0x594) [0x55ff8cd16bc4]
python(_PyObject_FastCallDict+0x5f) [0x55ff8cca762f]
python(+0x194d2b) [0x55ff8ccddd2b]
python(_PyObject_MakeTpCall+0x3bf) [0x55ff8cc7a13f]
python(_PyEval_EvalFrameDefault+0x4eff) [0x55ff8cd2489f]
python(_PyEval_EvalCodeWithName+0x260) [0x55ff8cd15600]
python(_PyFunction_Vectorcall+0x534) [0x55ff8cd16b64]
python(+0x166bf8) [0x55ff8ccafbf8]
python(PyObject_Call+0x7d) [0x55ff8cc8020d]
python(_PyEval_EvalFrameDefault+0x1f07) [0x55ff8cd218a7]
python(_PyEval_EvalCodeWithName+0x260) [0x55ff8cd15600]
python(_PyFunction_Vectorcall+0x594) [0x55ff8cd16bc4]
python(_PyObject_FastCallDict+0x5f) [0x55ff8cca762f]
python(+0x194d2b) [0x55ff8ccddd2b]
python(_PyObject_MakeTpCall+0x3bf) [0x55ff8cc7a13f]
python(_PyEval_EvalFrameDefault+0x4eff) [0x55ff8cd2489f]
python(_PyEval_EvalCodeWithName+0x260) [0x55ff8cd15600]
python(_PyFunction_Vectorcall+0x534) [0x55ff8cd16b64]
python(+0x166bf8) [0x55ff8ccafbf8]
python(PyObject_Call+0x7d) [0x55ff8cc8020d]
python(_PyEval_EvalFrameDefault+0x1f07) [0x55ff8cd218a7]
python(_PyEval_EvalCodeWithName+0x260) [0x55ff8cd15600]
python(_PyFunction_Vectorcall+0x594) [0x55ff8cd16bc4]

E..
======================================================================
ERROR: test_autograd_remove_epsilon_and_add_self_loops (__main__.TestRemoveEpsilonDevice)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./remove_epsilon_test.py", line 277, in test_autograd_remove_epsilon_and_add_self_loops
    dest = k2.remove_epsilon_and_add_self_loops(src)
  File "/home/ddp-aban/soft/anaconda3/envs/k2/lib/python3.8/site-packages/k2-1.17.dev20220720+cuda11.1.torch1.8.1-py3.8-linux-x86_64.egg/k2/fsa_algo.py", line 647, in remove_epsilon_and_add_self_loops
    out_fsa = k2.utils.fsa_from_unary_function_ragged(
  File "/home/ddp-aban/soft/anaconda3/envs/k2/lib/python3.8/site-packages/k2-1.17.dev20220720+cuda11.1.torch1.8.1-py3.8-linux-x86_64.egg/k2/utils.py", line 521, in fsa_from_unary_function_ragged
    setattr(dest, name, new_value.remove_values_eq(filler))
RuntimeError: 
    Some bad things happened. Please read the above error messages and stack
    trace. If you are using Python, the following command may be helpful:

      gdb --args python /path/to/your/code.py

    (You can use `gdb` to debug the code. Please consider compiling
    a debug version of k2.).

    If you are unable to fix it, please open an issue at:

      https://github.com/k2-fsa/k2/issues/new

======================================================================
ERROR: test1 (__main__.TestRemoveEpsilonDeviceFillers)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./remove_epsilon_test.py", line 342, in test1
    fsa = k2.Fsa.from_str(s, aux_label_names=['foo']).to(device)
  File "/home/ddp-aban/soft/anaconda3/envs/k2/lib/python3.8/site-packages/k2-1.17.dev20220720+cuda11.1.torch1.8.1-py3.8-linux-x86_64.egg/k2/fsa.py", line 1097, in to
    ans = Fsa(self.arcs.to(device), properties=self.properties)
RuntimeError: 
    Some bad things happened. Please read the above error messages and stack
    trace. If you are using Python, the following command may be helpful:

      gdb --args python /path/to/your/code.py

    (You can use `gdb` to debug the code. Please consider compiling
    a debug version of k2.).

    If you are unable to fix it, please open an issue at:

      https://github.com/k2-fsa/k2/issues/new

======================================================================
FAIL: test_autograd (__main__.TestRemoveEpsilonDevice)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./remove_epsilon_test.py", line 210, in test_autograd
    assert dest.int_attr == expected_int_attr
AssertionError

----------------------------------------------------------------------
Ran 6 tests in 3.129s

FAILED (failures=1, errors=2)
csukuangfj commented 2 years ago

Someone reported the same error sometime before with cuda 11.1 + torch 1.8.0.

But the error disappears without changing any code just by switching to cuda 10.2 + torch 1.10.0.

ahban commented 2 years ago

cool. I am moving to install torch 1.12 to have a try.

ahban commented 2 years ago

@csukuangfj 1.12 works well on Centos 7. many thanks