Error in Yes No recipe probably due to installation

aarora8 commented 3 years ago

Hi, I installed k2 from source and lhotse via pip. To check if my (k2 and lhotse) installation is ok, I am trying to run Yes No recipe. I did not change anything in the scripts however, while running yes no recipe, I am getting an error (RuntimeError: invalid device function). I am getting the same error in librispeech recipe and a recipe which I wrote. It seems to be due to installation and probably nvcc version and if anybody can help me with this error. My log with environment information is as follows:

# Running on r7n04
# Started at Sun Oct 31 20:00:38 EDT 2021
# /home/hltcoe/aarora/miniconda3/envs/k2_scratch2/bin/python3 ./tdnn/train.py
2021-10-31 20:00:40,299 INFO [train.py:481] Training started
2021-10-31 20:00:40,299 INFO [train.py:482] {'exp_dir': PosixPath('tdnn/exp'), 'lang_dir': PosixPath('data/lang_phone'), 'lr': 0.01, 'feature_dim': 23, 'weight_decay': 1e-06, 'start_epoch': 0, 'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 10, 'reset_interval': 20, 'valid_interval': 10, 'beam_size': 10, 'reduction': 'sum', 'use_double_scores': True, 'world_size': 1, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 15, 'feature_dir': PosixPath('data/fbank'), 'max_duration': 30.0, 'bucketing_sampler': False, 'num_buckets': 10, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2, 'env_info': {'k2-version': '1.9', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '7178d67e594bc7fa89c2b331ad7bd1c62a6a9eb4', 'k2-git-date': 'Tue Oct 26 10:12:54 2021', 'lhotse-version': '0.11.0.dev+git.7f56dd1.clean', 'torch-cuda-available': True, 'torch-cuda-version': '10.1', 'python-version': '3.8', 'icefall-git-branch': 'coe_asr2', 'icefall-git-sha1': 'e06baf3-clean', 'icefall-git-date': 'Sun Oct 31 19:53:21 2021', 'icefall-path': '/exp/aarora/icefall_work_env/icefall', 'k2-path': '/exp/aarora/icefall_work_env/k2_me/k2/python/k2/__init__.py', 'lhotse-path': '/exp/aarora/icefall_work_env/lhotse/lhotse/__init__.py'}}
2021-10-31 20:00:40,326 INFO [lexicon.py:176] Loading pre-compiled data/lang_phone/Linv.pt
2021-10-31 20:00:43,731 INFO [asr_datamodule.py:145] About to get train cuts
2021-10-31 20:00:43,732 INFO [asr_datamodule.py:242] About to get train cuts
2021-10-31 20:00:43,758 INFO [asr_datamodule.py:148] About to create train dataset
2021-10-31 20:00:43,758 INFO [asr_datamodule.py:199] Using SingleCutSampler.
2021-10-31 20:00:43,760 INFO [asr_datamodule.py:205] About to create train dataloader
2021-10-31 20:00:43,760 INFO [asr_datamodule.py:218] About to get test cuts
2021-10-31 20:00:43,761 INFO [asr_datamodule.py:248] About to get test cuts
Traceback (most recent call last):
  File "./tdnn/train.py", line 573, in <module>
    main()
  File "./tdnn/train.py", line 569, in main
    run(rank=0, world_size=1, args=args)
  File "./tdnn/train.py", line 534, in run
    train_one_epoch(
  File "./tdnn/train.py", line 404, in train_one_epoch
    loss, loss_info = compute_loss(
  File "./tdnn/train.py", line 300, in compute_loss
    decoding_graph = graph_compiler.compile(texts)
  File "/exp/aarora/icefall_work_env/icefall/icefall/graph_compiler.py", line 74, in compile
    transcript_fsa = self.convert_transcript_to_fsa(texts)
  File "/exp/aarora/icefall_work_env/icefall/icefall/graph_compiler.py", line 116, in convert_transcript_to_fsa
    word_fsa = k2.linear_fsa(word_ids_list, self.device)
  File "/exp/aarora/icefall_work_env/k2_me/k2/python/k2/fsa_algo.py", line 66, in linear_fsa
    ragged_arc = _k2.linear_fsa(labels, device)
**RuntimeError: invalid device function**
# Accounting: time=7 threads=1
# Finished at Sun Oct 31 20:00:45 EDT 2021 with status 1

csukuangfj commented 3 years ago

Could you show us the output of

nvidia-smi

?

aarora8 commented 3 years ago

Thanks, it seems that the issue is related to GPU and NVCC. I ran on a particular GPU and it completed successfully. I got the following output:

# Running on r2n02
# Started at Sun Oct 31 22:32:08 EDT 2021
# /home/hltcoe/aarora/miniconda3/envs/k2_scratch2/bin/python3 ./tdnn/train.py
2021-10-31 22:32:10,245 INFO [train.py:481] Training started
2021-10-31 22:32:10,245 INFO [train.py:482] {'exp_dir': PosixPath('tdnn/exp'), 'lang_dir': PosixPath('data/lang_phone'), 'lr': 0.01, 'feature_dim': 23, 'weight_decay': 1e-06, 'start_epoch': 0, 'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 10, 'reset_interval': 20, 'valid_interval': 10, 'beam_size': 10, 'reduction': 'sum', 'use_double_scores': True, 'world_size': 1, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 15, 'feature_dir': PosixPath('data/fbank'), 'max_duration': 30.0, 'bucketing_sampler': False, 'num_buckets': 10, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2, 'env_info': {'k2-version': '1.9', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '7178d67e594bc7fa89c2b331ad7bd1c62a6a9eb4', 'k2-git-date': 'Tue Oct 26 10:12:54 2021', 'lhotse-version': '0.11.0.dev+git.7f56dd1.clean', 'torch-cuda-available': True, 'torch-cuda-version': '10.1', 'python-version': '3.8', 'icefall-git-branch': 'coe_asr2', 'icefall-git-sha1': 'e06baf3-dirty', 'icefall-git-date': 'Sun Oct 31 19:53:21 2021', 'icefall-path': '/exp/aarora/icefall_work_env/icefall', 'k2-path': '/exp/aarora/icefall_work_env/k2_me/k2/python/k2/__init__.py', 'lhotse-path': '/exp/aarora/icefall_work_env/lhotse/lhotse/__init__.py'}}
2021-10-31 22:32:10,281 INFO [lexicon.py:176] Loading pre-compiled data/lang_phone/Linv.pt
2021-10-31 22:32:13,366 INFO [asr_datamodule.py:145] About to get train cuts
2021-10-31 22:32:13,367 INFO [asr_datamodule.py:242] About to get train cuts
2021-10-31 22:32:13,385 INFO [asr_datamodule.py:148] About to create train dataset
2021-10-31 22:32:13,385 INFO [asr_datamodule.py:199] Using SingleCutSampler.
2021-10-31 22:32:13,388 INFO [asr_datamodule.py:205] About to create train dataloader
2021-10-31 22:32:13,388 INFO [asr_datamodule.py:218] About to get test cuts
2021-10-31 22:32:13,388 INFO [asr_datamodule.py:248] About to get test cuts
2021-10-31 22:32:14,011 INFO [train.py:420] Epoch 0, batch 0, loss[loss=1.061, over 2805 frames.], tot_loss[loss=1.061, over 2805 frames.], batch size: 5
2021-10-31 22:32:14,524 INFO [train.py:420] Epoch 0, batch 10, loss[loss=0.4313, over 2695 frames.], tot_loss[loss=0.6688, over 22140.152947017563 frames.], batch size: 5
2021-10-31 22:32:15,141 INFO [train.py:444] Epoch 0, validation loss=0.862, over 17976 frames.
2021-10-31 22:32:51,819 INFO [train.py:444] Epoch 14, validation loss=0.01105, over 17976 frames.
2021-10-31 22:32:52,079 INFO [checkpoint.py:62] Saving checkpoint to tdnn/exp/epoch-14.pt
2021-10-31 22:32:52,086 INFO [train.py:553] Done!
# Accounting: time=44 threads=1
# Finished at Sun Oct 31 22:32:52 EDT 2021 with status 0

Nvidia-smi of this GPU (r2n02) is as follows:

Sun Oct 31 22:49:10 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 465.19.01    CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA TITAN RTX    On   | 00000000:3B:00.0 Off |                  N/A |
| 41%   26C    P8    15W / 200W |      1MiB / 24220MiB |      0%   E. Process |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA TITAN RTX    On   | 00000000:5E:00.0 Off |                  N/A |
| 40%   25C    P8    10W / 200W |      1MiB / 24220MiB |      0%   E. Process |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA TITAN RTX    On   | 00000000:B1:00.0 Off |                  N/A |
| 41%   25C    P8    15W / 200W |      1MiB / 24220MiB |      0%   E. Process |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA TITAN RTX    On   | 00000000:D9:00.0 Off |                  N/A |
| 40%   26C    P8    14W / 200W |      1MiB / 24220MiB |      0%   E. Process |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Nvidia-smi of GPU (r7n04) is as follows:
Sun Oct 31 22:51:12 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 465.19.01    CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA Tesla V1...  On   | 00000000:1A:00.0 Off |                    0 |
| N/A   27C    P0    24W / 200W |      0MiB / 32510MiB |      0%   E. Process |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA Tesla V1...  On   | 00000000:1B:00.0 Off |                    0 |
| N/A   52C    P0   180W / 200W |  31729MiB / 32510MiB |    100%   E. Process |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA Tesla V1...  On   | 00000000:1C:00.0 Off |                    0 |
| N/A   27C    P0    24W / 200W |      0MiB / 32510MiB |      0%   E. Process |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA Tesla V1...  On   | 00000000:3D:00.0 Off |                    0 |
| N/A   28C    P0    24W / 200W |      0MiB / 32510MiB |      0%   E. Process |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   4  NVIDIA Tesla V1...  On   | 00000000:3E:00.0 Off |                    0 |
| N/A   28C    P0    24W / 200W |      0MiB / 32510MiB |      0%   E. Process |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   5  NVIDIA Tesla V1...  On   | 00000000:8B:00.0 Off |                    0 |
| N/A   27C    P0    24W / 200W |      0MiB / 32510MiB |      0%   E. Process |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   6  NVIDIA Tesla V1...  On   | 00000000:8C:00.0 Off |                    0 |
| N/A   25C    P0    24W / 200W |      0MiB / 32510MiB |      0%   E. Process |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   7  NVIDIA Tesla V1...  On   | 00000000:B4:00.0 Off |                    0 |
| N/A   25C    P0    25W / 200W |      0MiB / 32510MiB |      0%   E. Process |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

csukuangfj commented 3 years ago

I suspect that the issue is caused by https://github.com/k2-fsa/k2/blob/master/CMakeLists.txt#L211

  # see https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/
  # https://www.myzhar.com/blog/tutorials/tutorial-nvidia-gpu-cuda-compute-capability/
    set(K2_COMPUTE_ARCH_CANDIDATES 35 50 60 61 70 75)

You can use the above two links to look up your GPU architecture. If it is not listed in K2_COMPUTE_ARCH_CANDIDATES, you can add it and recompile k2.

aarora8 commented 3 years ago

ok, thank you so much, got it.

aarora8 commented 3 years ago

The compute capability (70) of GPU in r7n04 (NVIDIA Tesla V100)is listed in K2_COMPUTE_ARCH_CANDIDATES. Do you think, compiling on V100 would help.

csukuangfj commented 3 years ago

The compute capability (70) of GPU in r7n04 (NVIDIA Tesla V100)is listed in K2_COMPUTE_ARCH_CANDIDATES. Do you think, compiling on V100 would help.

Did you compile k2 from source on the machine with NVIDIA TITAN RTX GPUs and run it on another machine with V100 GPUs?

aarora8 commented 3 years ago

Yeah, I compiled k2 on NVIDIA TITAN RTX GPU and run it on V100 GPUs. I will now compile it with V100 GPU.

csukuangfj commented 3 years ago

If yes, I would suggest two ways:

(1) Compile k2 separately on each machine

(2) Modify https://github.com/k2-fsa/k2/blob/master/CMakeLists.txt#L232

  message(STATUS "K2_COMPUTE_ARCHS: ${K2_COMPUTE_ARCHS}")

Add the following line before the above line

set(K2_COMPUTE_ARCHS 70 75)

and then compile k2 on either machine. The two machines can share a single version with this approach.

csukuangfj commented 3 years ago

A third alternative is to compile k2 on the machine with V100 GPUs and run it on another machine without modifying the source code of k2, but its speed at runtime may be affected.

aarora8 commented 3 years ago

ok, thank you, got it.

aarora8 commented 3 years ago

Thanks, my scripts are now running with out invalid device function error.

k2-fsa / icefall

Error in Yes No recipe probably due to installation #99