wenet-e2e / wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit
https://wenet-e2e.github.io/wenet/
Apache License 2.0
4.15k stars 1.07k forks source link

Error during build of Horizon BPU (cross-compilation) runtime #2639

Open kanpapa opened 2 weeks ago

kanpapa commented 2 weeks ago

Describe the bug Error during build of Horizon BPU (cross-compilation) runtime.

To Reproduce The build procedure follows https://github.com/wenet-e2e/wenet/blob/main/runtime/horizonbpu/README.md.

Steps to reproduce the behavior:

  1. In Step 1 to install Horizon packages, the following command was executed. (No error occurred in the procedure up to this step)
    pip install wheels/* -i https://mirrors.aliyun.com/pypi/simple
  2. The following error message was displayed.
    ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
    onnx 1.17.0 requires protobuf>=3.20.2, but you have protobuf 3.19.4 which is incompatible.
    tensorboard 2.14.0 requires protobuf>=3.19.6, but you have protobuf 3.19.4 which is incompatible.
    tensorboardx 2.6.2.2 requires protobuf>=3.20, but you have protobuf 3.19.4 which is incompatible.
    tiktoken 0.7.0 requires requests>=2.26.0, but you have requests 2.22.0 which is incompatible.
  3. To resolve the version conflict, I ran the following command
    pip install protobuf==3.20.2
  4. The following new error occurs.
    horizon-tc-ui 1.11.2 requires protobuf<=3.19.4,>=3.8.0, but you have protobuf 3.20.2 which is incompatible.
  5. It is an error in Step 1, but I built decoder_main in Step 2. The following command was executed.
    cmake -B build -DBPU=ON -DONNX=OFF -DTORCH=OFF -DWEBSOCKET=OFF -DGRPC=OFF -DCMAKE_TOOLCHAIN_FILE=toolchains/aarch64-linux-gnu.toolchain.cmake
    cmake --build build
  6. The following error message was displayed and the process was stopped.
    [ 63%] Building CXX object post_processor/CMakeFiles/post_processor.dir/post_processor.cc.o
    In file included from /home/ocha/wenet/runtime/horizonbpu/post_processor/post_processor.cc:16:
    /home/ocha/wenet/runtime/horizonbpu/post_processor/post_processor.h:22:10: fatal error: processor/wetext_processor.h: No such file or directory
    22 | #include "processor/wetext_processor.h"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    compilation terminated.
    gmake[2]: *** [post_processor/CMakeFiles/post_processor.dir/build.make:76: post_processor/CMakeFiles/post_processor.dir/post_processor.cc.o] Error 1
    gmake[1]: *** [CMakeFiles/Makefile2:1906: post_processor/CMakeFiles/post_processor.dir/all] Error 2
    gmake: *** [Makefile:156: all] Error 2

I would like your advice on how to deal with this problem.

Expected behavior The build should complete without problems.

Screenshots none.

Desktop (please complete the following information):

Additional context none.

cdliang11 commented 1 week ago

May need a lower version of onnx:

protobuf                 3.19.4
onnx                     1.12.0

The problem with TensorBoard can be ignored

kanpapa commented 1 week ago

Thanks, the protobuf and onnx versions are fixed.

The error with cmake build seems to be similar to #2032, but the situation is different.

cdliang11 commented 1 week ago

Add include(wetextprocessing) to horizonbpu/CMakeLists.txt

kanpapa commented 1 week ago

I made the following changes and Step 2 was completed successfully. Thanks,

(horizonbpu) ocha@ocha-ubuntu:~/wenet/runtime/horizonbpu$ git diff CMakeLists.txt
diff --git a/runtime/horizonbpu/CMakeLists.txt b/runtime/horizonbpu/CMakeLists.txt
index 9e179006..3d3ff629 100644
--- a/runtime/horizonbpu/CMakeLists.txt
+++ b/runtime/horizonbpu/CMakeLists.txt
@@ -37,6 +37,8 @@ include_directories(
   ${CMAKE_CURRENT_SOURCE_DIR}/kaldi
 )

+include(wetextprocessing)
+
 # Build all libraries
 add_subdirectory(utils)
 add_subdirectory(frontend)
(horizonbpu) ocha@ocha-ubuntu:~/wenet/runtime/horizonbpu$ 
kanpapa commented 1 week ago

The following error occurred in step 3.

(horizonbpu) ocha@ocha-ubuntu:~/wenet/runtime/horizonbpu$ tar -xzf model_subsample8_parameter110M.tar.gz
(horizonbpu) ocha@ocha-ubuntu:~/wenet/runtime/horizonbpu$ python3 $WENET_DIR/tools/onnx2horizonbin.py \
  --config ./model_subsample8_parameter110M/train.yaml \
  --checkpoint ./model_subsample8_parameter110M/final.pt \
  --output_dir ./model_subsample8_parameter110M/sample50_chunk8_leftchunk16 \
  --chunk_size 8 \
  --num_decoding_left_chunks 16 \
  --max_samples 50 \
  --dict ./model_subsample8_parameter110M/units.txt \
  --cali_datalist ./model_subsample8_parameter110M/calibration_data/data.list
Traceback (most recent call last):
  File "/home/ocha/wenet/runtime/horizonbpu/../..//tools/onnx2horizonbin.py", line 49, in <module>
    from wenet.utils.common import remove_duplicates_and_blank
ImportError: cannot import name 'remove_duplicates_and_blank' from 'wenet.utils.common' (/home/ocha/wenet/wenet/utils/common.py)

I will investigate.

cdliang11 commented 1 week ago

The following error occurred in step 3.

(horizonbpu) ocha@ocha-ubuntu:~/wenet/runtime/horizonbpu$ tar -xzf model_subsample8_parameter110M.tar.gz
(horizonbpu) ocha@ocha-ubuntu:~/wenet/runtime/horizonbpu$ python3 $WENET_DIR/tools/onnx2horizonbin.py \
  --config ./model_subsample8_parameter110M/train.yaml \
  --checkpoint ./model_subsample8_parameter110M/final.pt \
  --output_dir ./model_subsample8_parameter110M/sample50_chunk8_leftchunk16 \
  --chunk_size 8 \
  --num_decoding_left_chunks 16 \
  --max_samples 50 \
  --dict ./model_subsample8_parameter110M/units.txt \
  --cali_datalist ./model_subsample8_parameter110M/calibration_data/data.list
Traceback (most recent call last):
  File "/home/ocha/wenet/runtime/horizonbpu/../..//tools/onnx2horizonbin.py", line 49, in <module>
    from wenet.utils.common import remove_duplicates_and_blank
ImportError: cannot import name 'remove_duplicates_and_blank' from 'wenet.utils.common' (/home/ocha/wenet/wenet/utils/common.py)

I will investigate.

fix: wenet.utils.common import remove_duplicates_and_blank --> from wenet.utils.ctc_utils import remove_duplicates_and_blank

kanpapa commented 1 week ago

The following fixes have resolved this issue.

diff --git a/tools/onnx2horizonbin.py b/tools/onnx2horizonbin.py
index 96bc4061..0d9b7272 100755
--- a/tools/onnx2horizonbin.py
+++ b/tools/onnx2horizonbin.py
@@ -46,7 +46,8 @@ import numpy as np

 from torch.utils.data import DataLoader

-from wenet.utils.common import remove_duplicates_and_blank
+#from wenet.utils.common import remove_duplicates_and_blank
+from wenet.utils.ctc_utils import remove_duplicates_and_blank
 from wenet.dataset.dataset import Dataset
 from wenet.utils.checkpoint import load_checkpoint
 from wenet.utils.init_model import init_model

However, when I ran it again, I got the following error message.

(horizonbpu) ocha@ocha-ubuntu:~/wenet/runtime/horizonbpu$ python3 $WENET_DIR/tools/onnx2horizonbin.py \
  --config ./model_subsample8_parameter110M/train.yaml \
  --checkpoint ./model_subsample8_parameter110M/final.pt \
  --output_dir ./model_subsample8_parameter110M/sample50_chunk8_leftchunk16 \
  --chunk_size 8 \
  --num_decoding_left_chunks 16 \
  --max_samples 50 \
  --dict ./model_subsample8_parameter110M/units.txt \
  --cali_datalist ./model_subsample8_parameter110M/calibration_data/data.list
Traceback (most recent call last):
  File "/home/ocha/wenet/runtime/horizonbpu/../..//tools/onnx2horizonbin.py", line 51, in <module>
    from wenet.dataset.dataset import Dataset
  File "/home/ocha/wenet/wenet/dataset/dataset.py", line 20, in <module>
    from wenet.dataset.datapipes import (WenetRawDatasetSource,
  File "/home/ocha/wenet/wenet/dataset/datapipes.py", line 27, in <module>
    from torch.utils.data.datapipes.iter.sharding import (
ModuleNotFoundError: No module named 'torch.utils.data.datapipes.iter.sharding'

Is the module not present because pytorch is out of date? I am investigating this as well.

kanpapa commented 1 week ago

I checked the status of changes in pytorch and wenet.

In Wenet, the torch.utils.data.datapipes.iter.sharding module was added to datapipes.py in fix #2316.

pytorch added the torch.utils.data.datapipes.iter.sharding module in a recent refactoring. https://github.com/pytorch/pytorch/pull/94095

As a result, this module is not present in pytorch 1.13.0, which is targeted by the Horizon BPU runtime.

kanpapa commented 1 week ago

I tried it with the release tag v2.2.0 source when horizonbpu was first supported by WENET.

git clone -b v2.2.0 https://github.com/wenet-e2e/wenet.git

The version of onnx was specified as 1.12.0.

pip install torch==1.13.0 torchaudio==0.13.0 torchvision==0.14.0 onnx==1.12.0 onnxruntime -i https://mirrors.aliyun.com/pypi/simple

The versions of the package are as follows

# Name                    Version                   Build  Channel
protobuf                  3.19.4                   pypi_0    pypi
onnx                      1.12.0                   pypi_0    pypi
onnxruntime               1.19.2                   pypi_0    pypi
torch                     1.13.0                   pypi_0    pypi
torchaudio                0.13.0                   pypi_0    pypi
torchvision               0.14.0                   pypi_0    pypi

There was no problem until Step 2, but the following error occurred in Step 3.

(horizonbpu) ocha@ocha-ubuntu:~/wenet/runtime/horizonbpu$ python3 $WENET_DIR/tools/onnx2horizonbin.py \
  --config ./model_subsample8_parameter110M/train.yaml \
  --checkpoint ./model_subsample8_parameter110M/final.pt \
  --output_dir ./model_subsample8_parameter110M/sample50_chunk8_leftchunk16 \
  --chunk_size 8 \
  --num_decoding_left_chunks 16 \
  --max_samples 50 \
  --dict ./model_subsample8_parameter110M/units.txt \
  --cali_datalist ./model_subsample8_parameter110M/calibration_data/data.list
Traceback (most recent call last):
  File "/home/ocha/wenet/runtime/horizonbpu/../..//tools/onnx2horizonbin.py", line 53, in <module>
    from wenet.utils.init_model import init_model
  File "/home/ocha/wenet/wenet/utils/init_model.py", line 16, in <module>
    from wenet.transducer.joint import TransducerJoint
  File "/home/ocha/wenet/wenet/transducer/joint.py", line 5, in <module>
    from typeguard import check_argument_types
ImportError: cannot import name 'check_argument_types' from 'typeguard' (/home/ocha/miniconda3/envs/horizonbpu/lib/python3.8/site-packages/typeguard/__init__.py)

check_argument_types is a function introduced in the 2.x series of typeguard. However, this function was later deprecated and removed in version 3.0.0 and later. Therefore, typeguard was downgraded.

pip install typeguard==2.13.3

I tried running it again with this condition, but it resulted in an error.

(horizonbpu) ocha@ocha-ubuntu:~/wenet/runtime/horizonbpu$ python3 $WENET_DIR/tools/onnx2horizonbin.py \
  --config ./model_subsample8_parameter110M/train.yaml \
  --checkpoint ./model_subsample8_parameter110M/final.pt \
  --output_dir ./model_subsample8_parameter110M/sample50_chunk8_leftchunk16 \
  --chunk_size 8 \
  --num_decoding_left_chunks 16 \
  --max_samples 50 \
  --dict ./model_subsample8_parameter110M/units.txt \
  --cali_datalist ./model_subsample8_parameter110M/calibration_data/data.list
Failed to import k2 and icefall.         Notice that they are necessary for hlg_onebest and hlg_rescore
Please install onnx and onnxruntime!

I have onnx and onnxruntime installed. It is possible that k2 and icefall are not installed correctly.