Open philippbb opened 3 years ago
Hello,
"CUBLAS matrix multiplication failed" is a runtime error when doing matrix multiplication using cublasSgemm https://github.com/nii-yamagishilab/project-CURRENNT-public/blob/7ca0103e13d7e868a451690679e16fa6a59d1146/CURRENNT_codes/currennt_lib/src/helpers/cublas.cu#L83
It can be the issue of CUDA version (which version do you use? CUDA7.0, 8.0, 9.0, and 10.0 work on my side). It may also be the issue of data -- wrong format leads to wrong dimension size
Please try to use CPU by setting flag_CPU_gen = 1 here https://github.com/nii-yamagishilab/project-CURRENNT-scripts/blob/3de6d32e5e556a71fac1b4010d00b7c000fa5912/waveform-modeling/project-WaveNet-pretrained/config.py#L113
This will avoid using GPU and cublas for matrix multiplication. If the code works, it indicates an issue with CUDA.
It worked on cpu, thank you very much.
I checked CUDA. Cuda directory under home usr cuda is linked to 9.0.
Also I rebuild CURRENNT again to check the output which I attached below. Cuda version should be correct but maybe the cuda library mentioned in the output could be wrong version
/usr/lib/x86_64-linux-gnu/libcuda.so
I am not sure when I switch between cuda version what happens with this library.
-- CUDA_VERSION: 9.0 -- CUDA_INCLUDE_DIRS: /usr/local/cuda-9.0/include -- CUDA_CUDA_LIBRARY: /usr/lib/x86_64-linux-gnu/libcuda.so -- CUDA_CUDART_LIBRARY: /usr/local/cuda-9.0/lib64/libcudart.so -- CUDA_cublas_LIBRARY: /usr/local/cuda-9.0/lib64/libcublas.so -- CUDA_CUFFT_LIBRARIES: /usr/local/cuda-9.0/lib64/libcufft.so -- CUDA_curand_LIBRARY: /usr/local/cuda-9.0/lib64/libcurand.so -- Boost_INCLUDE_DIRS: /home/philipp/AITeam/boost_1_59_0 -- Boost_LIBRARIES: /home/philipp/AITeam/boost_1_59_0/stage/lib/libboost_program_options.so;/home/philipp/AITeam/boost_1_59_0/stage/lib/libboost_system.so;/home/philipp/AITeam/boost_1_59_0/stage/lib/libboost_filesystem.so;/home/philipp/AITeam/boost_1_59_0/stage/lib/libboost_random.so;/home/philipp/AITeam/boost_1_59_0/stage/lib/libboost_thread.so;-lpthread -- NetCDF Lib: /home/philipp/AITeam/netcdf/lib -- Configuring done -- Generating done -- Build files have been written to: /home/philipp/AITeam/project-CURRENNT-public/CURRENNT_codes/build
Edit: I gonna check later if its the lib inside linux-gnu causing the problem. thank you.
I know little on linking and compiling, but you may try this tool to check the actual lib linked to the executable code. https://man7.org/linux/man-pages/man1/ldd.1.html
ldd currennt
You may see something like
libcublas.so.xx => ...
libcufft.so.xx => ...
This may tell more.
At last, cuda9.0 should work. I used cuda9.0 a long time ago.
FYI
I recently swithed to Pytorch. I re-implemented the code including the WaveNet. You may check it here https://github.com/nii-yamagishilab/project-NN-Pytorch-scripts. There is a demo project to run the WaveNet on CMU arctic.
There is a Jupyter notebook on WaveNet too: https://github.com/nii-yamagishilab/project-NN-Pytorch-scripts/blob/master/tutorials/s3_demonstration_wavenet.ipynb
Hi
My goal was to train another dataset for CURRENT but before that I wanted to check out the pretrained scripts.
Now I think I setup everything according to the documentation readme files, but when I run for example
01_gen.sh in project-WaveNet-pretrained
i get below error message. I used python 2.7 with cython, numpy and scipy installed on it. Only thing I could think of at the moment is wrong cuda version, but it went trough building process of CURRENNT withouth error i think. I will try to get some insight in debug builds...
... (249) postprocessingL1 feedforward_tanh [size: 256, bias: 1.0, weights: 65792] (250) postprocessingL2 feedforward_tanh [size: 512, bias: 1.0, weights: 131584] (251) output feedforward_identity [size: 1024, bias: 1.0, weights: 525312] (252) postoutput mdn [size: 1] Total weights: 2440640
Outputs from layer -1, HTK format (float32, big-endian), de-normalized Computing outputs for data fraction 1 ... arctic_a0113 SSAMPOpt: 4, SSAMPPara: 0 FAILED in running CURENNT: CUBLAS matrix multiplication failed Failed to run:/home/philipp/AITeam/project-CURRENNT-public/CURRENNT_codes/build/currennt --train false --ff_output_format htk --parallel_sequences 1 --input_noise_sigma 0 --random_seed 12345231 --shuffle_fractions false --shuffle_sequences false --revert_std true --ScheduleSampOpt 4 --ScheduleSampPara 0 --mdnSoftmaxGenMethod 2 --network /home/philipp/AITeam/project-CURRENNT-scripts/waveform-modeling/project-WaveNet-pretrained/MODELS/wavenet001///trained_network.jsn --ff_output_file /home/philipp/AITeam/project-CURRENNT-scripts/waveform-modeling/project-WaveNet-pretrained/MODELS/wavenet001//output --ff_input_file /home/philipp/AITeam/project-CURRENNT-scripts/waveform-modeling/project-WaveNet-pretrained/TESTDATATEMP/ncData/DATA_TEST/data.nc1 --ExtInputDirs /home/philipp/AITeam/project-CURRENNT-scripts/waveform-modeling/project-WaveNet-pretrained/../TESTDATA-for-pretrained/mfbsp --ExtInputExts .mfbsp --ExtInputDims 80 --resolutions 80 --waveNetMemSave 1 Please check the printed error message Process terminated with 2