Tested versions to compile pkwrap

groadabike commented 3 years ago

First of all, thank you for your great job.

I am trying to run the mini_librispeech recipe to understand how the package works. But, I got a CUBLAS error when I try to run the tdnnf training.

I believe I have a version problem. Can you please tell me in what versions of Python, PyTorch, cuda, g++ you have successfully compiled the package?

I can see in the setup.py file, you accept PyTorch until 1.7. Is that including 1.7.1? Which cuda version?

Thank you so much for your help

mrsrikanth commented 3 years ago

Hello,

Yes, pkwrap works with pytorch 1.7 (and 1.8 as well, but the setup doesn't allow it yet, I will fix that). I have test it to work with cuda 9.2, 10.2 and 11.0 . Could you please share the error that you get?

groadabike commented 3 years ago

Hi, Thank you for your answer. I think my error is I have cuda 11.1, I may need to install version 10.2.

The error is log in file train.0.1.log

 # local/chain/tuning/tdnnf.py --dir exp/chain/tdnnf --mode training --lr 0.001 --frame-shift 0 --egs ark:exp/chain/tdnnf/egs/cegs.1.ark --l2-regularize-factor 1.0 --minibatch-size 16 --new-model exp/chain/tdnnf/0.1.pt exp/chain/tdnnf/0.pt 
# Started at Mon Mar 29 21:04:35 BST 2021
#
WARNING ([5.5.899~1-3d0e4313]:SelectGpuId():cu-device.cc:243) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG ([5.5.899~1-3d0e4313]:SelectGpuIdAuto():cu-device.cc:438) Selecting from 1 GPUs
LOG ([5.5.899~1-3d0e4313]:SelectGpuIdAuto():cu-device.cc:453) cudaSetDevice(0): GeForce RTX 2060    free:5346M, used:588M, total:5934M, free/total:0.900856
LOG ([5.5.899~1-3d0e4313]:SelectGpuIdAuto():cu-device.cc:501) Device: 0, mem_ratio: 0.900856
LOG ([5.5.899~1-3d0e4313]:SelectGpuId():cu-device.cc:382) Trying to select device: 0
LOG ([5.5.899~1-3d0e4313]:SelectGpuIdAuto():cu-device.cc:511) Success selecting device 0 free mem ratio: 0.900856
LOG ([5.5.899~1-3d0e4313]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: GeForce RTX 2060 free:4106M, used:1828M, total:5934M, free/total:0.691953 version 7.5
LOG ([5.5.899~1-3d0e4313]:PrintSpecificStats():nnet-example-utils.cc:1159) Merged specific eg types as follows [format: <eg-size1>={<mb-size1>-><num-minibatches1>,<mbsize2>-><num-minibatches2>.../d=<num-discarded>},<egs-size2>={...},... (note,egs-size == number of input frames including context).
LOG ([5.5.899~1-3d0e4313]:PrintSpecificStats():nnet-example-utils.cc:1189) 167={16->607,d=4}
LOG ([5.5.899~1-3d0e4313]:PrintAggregateStats():nnet-example-utils.cc:1155) Processed 9716 egs of avg. size 167 into 607 minibatches, discarding 0.04117% of egs.  Avg minibatch size was 16, #distinct types of egs/minibatches was 1/1
ERROR ([5.5.899~1-3d0e4313]:AddMatVec():cu-vector.cc:521) cublasStatus_t 15 : "CUBLAS_STATUS_NOT_SUPPORTED" returned from 'cublas_gemv(GetCublasHandle(), (trans==kTrans? CUBLAS_OP_N:CUBLAS_OP_T), M.NumCols(), M.NumRows(), alpha, M.Data(), M.Stride(), v.Data(), 1, beta, data_, 1)'

[ Stack-Trace: ]
/media/gerardo/Extended/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x726) [0x7feb5ad78120]
/media/gerardo/Extended/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x25) [0x7feb5ad79b21]
/media/gerardo/Extended/kaldi/src/lib/libkaldi-cudamatrix.so(kaldi::CuVectorBase<float>::AddMatVec(float, kaldi::CuMatrixBase<float> const&, kaldi::MatrixTransposeType, kaldi::CuVectorBase<float> const&, float)+0x312) [0x7feb56a964de]
/media/gerardo/Extended/kaldi/src/lib/libkaldi-chain.so(kaldi::chain::DenominatorComputation::Beta(int)+0xcf) [0x7feb56358701]
/media/gerardo/Extended/kaldi/src/lib/libkaldi-chain.so(kaldi::chain::DenominatorComputation::Backward(float, kaldi::CuMatrixBase<float>*)+0x3a) [0x7feb5635a788]
/media/gerardo/Extended/kaldi/src/lib/libkaldi-chain.so(kaldi::chain::ComputeChainObjfAndDeriv(kaldi::chain::ChainTrainingOptions const&, kaldi::chain::DenominatorGraph const&, kaldi::chain::Supervision const&, kaldi::CuMatrixBase<float> const&, float*, float*, float*, kaldi::CuMatrixBase<float>*, kaldi::CuMatrix<float>*)+0xfa) [0x7feb5635ba9a]
/media/gerardo/Extended/pkwrap/_pkwrap.cpython-38-x86_64-linux-gnu.so(ComputeChainObjfAndDeriv(kaldi::chain::ChainTrainingOptions const&, kaldi::chain::DenominatorGraph const&, kaldi::chain::Supervision const&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&)+0x340) [0x7feb581cddc0]
/media/gerardo/Extended/pkwrap/_pkwrap.cpython-38-x86_64-linux-gnu.so(+0x411ec) [0x7feb581a81ec]
/media/gerardo/Extended/pkwrap/_pkwrap.cpython-38-x86_64-linux-gnu.so(+0x47452) [0x7feb581ae452]
python3(PyCFunction_Call+0x56) [0x559e0f21af76]
python3(_PyObject_MakeTpCall+0x22f) [0x559e0f1d885f]
python3(_PyEval_EvalFrameDefault+0x4596) [0x559e0f25ff56]
python3(_PyFunction_Vectorcall+0x10b) [0x559e0f22686b]
python3(PyVectorcall_Call+0x71) [0x559e0f1d8041]
/media/gerardo/Extended/venv/lib/python3.8/site-packages/torch/lib/libtorch_python.so(THPFunction_apply(_object*, _object*)+0x93d) [0x7febc439c2dd]
python3(PyCFunction_Call+0xdb) [0x559e0f21affb]
python3(_PyObject_MakeTpCall+0x22f) [0x559e0f1d885f]
python3(_PyEval_EvalFrameDefault+0x475) [0x559e0f25be35]
python3(_PyEval_EvalCodeWithName+0x2d2) [0x559e0f225a92]
python3(_PyFunction_Vectorcall+0x1e3) [0x559e0f226943]
python3(+0x10011a) [0x559e0f19b11a]
python3(+0x18bc0b) [0x559e0f226c0b]
python3(+0x10077f) [0x559e0f19b77f]
python3(+0x18bc0b) [0x559e0f226c0b]
python3(+0x10077f) [0x559e0f19b77f]
python3(_PyEval_EvalCodeWithName+0x659) [0x559e0f225e19]
python3(_PyFunction_Vectorcall+0x1e3) [0x559e0f226943]
python3(_PyObject_FastCallDict+0x24b) [0x559e0f2274cb]
python3(_PyObject_Call_Prepend+0x63) [0x559e0f227733]
python3(+0x18c8ca) [0x559e0f2278ca]
python3(_PyObject_MakeTpCall+0x1a4) [0x559e0f1d87d4]
python3(_PyEval_EvalFrameDefault+0x11d0) [0x559e0f25cb90]
python3(_PyEval_EvalCodeWithName+0x2d2) [0x559e0f225a92]
python3(PyEval_EvalCodeEx+0x44) [0x559e0f226754]
python3(PyEval_EvalCode+0x1c) [0x559e0f2b4edc]
python3(+0x219f84) [0x559e0f2b4f84]
python3(+0x24c1f4) [0x559e0f2e71f4]
python3(PyRun_FileExFlags+0xa1) [0x559e0f1af6e1]
python3(PyRun_SimpleFileExFlags+0x3b4) [0x559e0f1afac6]
python3(+0x11598b) [0x559e0f1b098b]
python3(Py_BytesMain+0x39) [0x559e0f2e9d19]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7febc5f510b3]
python3(+0x1dee93) [0x559e0f279e93]

Traceback (most recent call last):
  File "local/chain/tuning/tdnnf.py", line 55, in <module>
    ChainModel(Net, cmd_line=True)
  File "/media/gerardo/Extended/pkwrap/pkwrap/chain.py", line 432, in __init__
    self.call_by_mode()
  File "/media/gerardo/Extended/pkwrap/pkwrap/chain.py", line 457, in call_by_mode
    self.train()
  File "/media/gerardo/Extended/pkwrap/pkwrap/chain.py", line 492, in train
    new_model = train_lfmmi_one_iter(
  File "/media/gerardo/Extended/pkwrap/pkwrap/chain.py", line 317, in train_lfmmi_one_iter
    deriv = criterion(training_opts, den_graph, sup, output, xent_output)
  File "/media/gerardo/Extended/pkwrap/pkwrap/chain.py", line 66, in forward
    kaldi.chain.ComputeChainObjfAndDeriv(
RuntimeError: kaldi::KaldiFatalError
# Accounting: time=6 threads=1
# Ended (code 1) at Mon Mar 29 21:04:41 BST 2021, elapsed time 6 seconds

mrsrikanth commented 3 years ago

Looks like the error originates in Kaldi

ERROR ([5.5.899~1-3d0e4313]:AddMatVec():cu-vector.cc:521) cublasStatus_t 15 : "CUBLAS_STATUS_NOT_SUPPORTED" returned from 'cublas_gemv(GetCublasHandle(), (trans==kTrans? CUBLAS_OP_N:CUBLAS_OP_T), M.NumCols(), M.NumRows(), alpha, M.Data(), M.Stride(), v.Data(), 1, beta, data_, 1)'

Is this a recent version of Kaldi?

groadabike commented 3 years ago

Yes, my version is 3d0e4313 Commit on Mar 23, 2021

mrsrikanth commented 3 years ago

Ok. I'm going to check compatibility with cuda 11.1 and latest Kaldi. But I can't do it today though.

groadabike commented 3 years ago

Thank you @mrsrikanth I can downgrade my cuda version and recompile. I will let you know the results.

groadabike commented 3 years ago

Hi, I couldn't compile pkwrap using Cuda 11.1 and pytorch 1.8.1 So, I downgraded and managed to compile and run the mini_librispeech recipe. Versions: Kaldi 3d0e4313 gcc 8 Cuda 10.2 Python 3.8.5 Pytorch 1.7.1 (conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=10.2 -c pytorch)

mrsrikanth commented 3 years ago

And it still doesn't compile?

groadabike commented 3 years ago

Sorry, I meant that with these versions it does compile and run. Kaldi 3d0e4313 gcc 8 Cuda 10.2 Python 3.8.5 Pytorch 1.7.1 (conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=10.2 -c pytorch)

mrsrikanth commented 3 years ago

@groadabike I tried installing pkwrap with pytorch 1.8.1 and cuda 11.1. So, far the hasn't been any issue with training.

>>> torch.__version__
'1.8.1'
>>> torch.version.cuda
'11.1'

EDIT: And Kaldi version from Apr 12, 2021 (SHA: e7455085430411a2749e81751cc93a4932302390)

idiap / pkwrap

Tested versions to compile pkwrap #14