Open omprakashsonie opened 4 years ago
Add capablity for your GPUs in src/gpucompute/Makefile.
I add capablity 7.0 slove this error for V100.
CUDA_VER_GT_9_0 := $(shell [ $(CUDA_VERSION) -ge 90 ] && echo true)
ifeq ($(CUDA_VER_GT_9_0), true)
CUDA_ARCH += -gencode arch=compute_70,code=sm_70
endif
Add capablity for your GPUs in src/gpucompute/Makefile.
I add capablity 7.0 slove this error for V100.
CUDA_VER_GT_9_0 := $(shell [ $(CUDA_VERSION) -ge 90 ] && echo true) ifeq ($(CUDA_VER_GT_9_0), true) CUDA_ARCH += -gencode arch=compute_70,code=sm_70 endif
so your GPU is V100 cuda version 9.0 ? do you need to install altas i met the error with A100 cuda 9.0 "cudaError_t 13 : "invalid device symbol" returned from 'cublasGetError()'“
My env: NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.0, Nvidia TITAN Xp
It work, when I add one line at src/gpucompute/Makefile :
CUDA_ARCH=-gencode arch=compute_61,code=sm_61
it has been verify compute_30 sm_30 or compute_80 sm_80 does't work.
Hi, it stops after this message: [NOTE] TOKEN_ACCURACY refers to token accuracy, i.e., (1.0 - token_error_rate). EPOCH 1 RUNNING ... TAG: 10_1.0 lrate 4e-05,
Brief error: ERROR (train-ctc-parallel:AddVecToRows():cuda-matrix.cc:541) cudaError_t 48 : "no kernel image is available for execution on the device" returned from 'cudaGetLastError()'
in python it finds device
Detailed error: ./exp/nml_seq_fw_seq_tw/train_lstm/log/tr.iter1.log train-ctc-parallel --report-step=1000 --num-sequence=10 --frame-limit=25000 --learn-rate=0.00004 --momentum=0.9 --verbose=1 'ark,s,cs:apply-cmvn --norm-vars=true --utt2spk=ark:./exp/nml_seq_fw_seq_tw/train_tr95/utt2spk scp:./exp/nml_seq_fw_seq_tw/train_tr95/cmvn.scp scp:./exp/nml_seq_fw_seq_tw/train_lstm/train_10_1.0.scp ark:- | splice-feats --left-context=1 --right-context=1 ark:- ark:- | add-deltas ark:- ark:- | subsample-feats --n=3 --offset=1 ark:- ark:- |' 'ark:gunzip -c ./exp/nml_seq_fw_seq_tw/train_lstm/labels.tr.gz|' ./exp/nml_seq_fw_seq_tw/train_lstm/nnet/nnet.iter0 ./exp/nml_seq_fw_seq_tw/train_lstm/nnet/nnet.iter1
LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:262) Selecting from 1 GPUs LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:277) cudaSetDevice(0): Tesla V100-PCIE-16GB free:15724M, used:428M, total:16152M, free/total:0.973502 LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:310) Selected device: 0 (automatically) LOG (train-ctc-parallel:FinalizeActiveGpu():cuda-device.cc:194) The active GPU is [0]: Tesla V100-PCIE-16GB free:15676M, used:476M, total:16152M, free/total:0.97053 version 7.0 LOG (train-ctc-parallel:PrintMemoryUsage():cuda-device.cc:334) Memory used: 0 bytes. LOG (train-ctc-parallel:DisableCaching():cuda-device.cc:731) Disabling caching of GPU memory. LOG (train-ctc-parallel:SetUpdateAlgorithm():net.cc:483) Selecting SGD with momentum as optimization algorithm. LOG (train-ctc-parallel:SetTrainMode():net.cc:408) Setting TrainMode for layer 0 LOG (train-ctc-parallel:SetTrainMode():net.cc:408) Setting TrainMode for layer 1 LOG (train-ctc-parallel:SetTrainMode():net.cc:408) Setting TrainMode for layer 2 LOG (train-ctc-parallel:SetTrainMode():net.cc:408) Setting TrainMode for layer 3
add-deltas ark:- ark:- splice-feats --left-context=1 --right-context=1 ark:- ark:- apply-cmvn --norm-vars=true --utt2spk=ark:./exp/nml_seq_fw_seq_tw/train_tr95/utt2spk scp:./exp/nml_seq_fw_seq_tw/train_tr95/cmvn.scp scp:./exp/nml_seq_fw_seq_tw/train_lstm/train_10_1.0.scp ark:- subsample-feats --n=3 --offset=1 ark:- ark:- LOG (train-ctc-parallel:main():train-ctc-parallel.cc:133) TRAINING STARTED
ERROR (train-ctc-parallel:AddVecToRows():cuda-matrix.cc:541) cudaError_t 48 : "no kernel image is available for execution on the device" returned from 'cudaGetLastError()' WARNING (train-ctc-parallel:Close():kaldi-io.cc:446) Pipe gunzip -c ./exp/nml_seq_fw_seq_tw/train_lstm/labels.tr.gz| had nonzero return status 13 WARNING (train-ctc-parallel:Close():kaldi-io.cc:446) Pipe apply-cmvn --norm-vars=true --utt2spk=ark:./exp/nml_seq_fw_seq_tw/train_tr95/utt2spk scp:./exp/nml_seq_fw_seq_tw/train_tr95/cmvn.scp scp:./exp/nml_seq_fw_seq_tw/train_lstm/train_10_1.0.scp ark:- | splice-feats --left-context=1 --right-context=1 ark:- ark:- | add-deltas ark:- ark:- | subsample-feats --n=3 --offset=1 ark:- ark:- | had nonzero return status 36096
ERROR (train-ctc-parallel:AddVecToRows():cuda-matrix.cc:541) cudaError_t 48 : "no kernel image is available for execution on the device" returned from 'cudaGetLastError()'
[stack trace: ] eesen::KaldiGetStackTrace[abi:cxx11]() eesen::KaldiErrorMessage::~KaldiErrorMessage() eesen::CuMatrixBase::AddVecToRows(float, eesen::CuVectorBase const&, float)
eesen::BiLstmParallel::PropagateFncRecurrentDropoutPassForward(eesen::CuMatrixBase const&, int, int)
eesen::BiLstmParallel::PropagateFnc(eesen::CuMatrixBase const&, eesen::CuMatrixBase)
eesen::Layer::Propagate(eesen::CuMatrixBase const&, eesen::CuMatrix )
eesen::Net::Propagate(eesen::CuMatrixBase const&, eesen::CuMatrix*)
train-ctc-parallel(main+0x1494) [0x434c48]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fb368c82830]
train-ctc-parallel(_start+0x29) [0x432119]