yeyupiaoling / PPASR

基于PaddlePaddle实现端到端中文语音识别,从入门到实战,超简单的入门案例,超实用的企业项目。支持当前最流行的DeepSpeech2、Conformer、Squeezeformer模型
Apache License 2.0
792 stars 131 forks source link

eval 测试的时候会报错 #110

Closed Tian14267 closed 1 year ago

Tian14267 commented 1 year ago

运行 eval.py文件的时候会报错:

Forward/backward pass size (MB): 53.81
Params size (MB): 135.26
Estimated Total Size (MB): 189.35
-----------------------------------------------------------------------------------------------------

[2022-10-13 14:12:11.302523 INFO   ] trainer:evaluate:140 - 成功加载模型:models/deepspeech2_fbank/best_model/model.pdparams
  0%|                                                                                                                                                                                                                                                                        | 0/127 [00:00<?, ?it/s]

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   FlagRegister<std::string>::SetDescription(std::string const&, FlagDescription<std::string> const&)
1   std::pair<std::_Rb_tree_iterator<std::pair<std::string const, FlagDescription<std::string> > >, bool> std::_Rb_tree<std::string, std::pair<std::string const, FlagDescription<std::string> >, std::_Select1st<std::pair<std::string const, FlagDescription<std::string> > >, std::less<std::string>, std::allocator<std::pair<std::string const, FlagDescription<std::string> > > >::_M_emplace_unique<std::pair<std::string, FlagDescription<std::string> > >(std::pair<std::string, FlagDescription<std::string> >&&)

----------------------
Error Message Summary:
----------------------
FatalError: `Segmentation fault` is detected by the operating system.
  [TimeInfo: *** Aborted at 1665641532 (unix time) try "date -d @1665641532" if you are using GNU date ***]
  [SignalInfo: *** SIGSEGV (@0xcd) received by PID 219888 (TID 0x7efd66b1c740) from PID 205 ***]

请问这个是什么情况,怎么解决啊

yeyupiaoling commented 1 year ago

我也看不出什么问题,训练正常吗? 要不你升级下scipy 试试

Tian14267 commented 1 year ago

@yeyupiaoling 问题应该出在模型的输出方面。打印出的模型结构如下:

-----------------------------------------------------------------------------------------------------
 Layer (type)             Input Shape                       Output Shape                Param #    
=====================================================================================================
   Conv2D-1            [[1, 1, 900, 80]]                  [1, 32, 449, 39]                320      
    GELU-1             [[1, 32, 449, 39]]                 [1, 32, 449, 39]                 0       
   ConvBn-1          [[1, 1, 900, 80], [1]]           [[1, 32, 449, 39], [1]]              0       
   Conv2D-2            [[1, 32, 449, 39]]                 [1, 32, 224, 19]               9,248     
    GELU-2             [[1, 32, 224, 19]]                 [1, 32, 224, 19]                 0       
   ConvBn-2         [[1, 32, 449, 39], [1]]           [[1, 32, 224, 19], [1]]              0       
  ConvStack-1         [[1, 900, 80], [1]]               [[1, 224, 608], [1]]               0       
     GRU-1         [[1, 224, 608], None, [1]]      [[1, 224, 1024], [1, 1, 1024]]      5,019,648   
  LayerNorm-1           [[1, 224, 1024]]                   [1, 224, 1024]                2,048     
 RNNForward-1      [[1, 224, 608], [1], None]      [[1, 224, 1024], [1, 1, 1024]]          0       
     GRU-2        [[1, 224, 1024], None, [1]]      [[1, 224, 1024], [1, 1, 1024]]      6,297,600   
  LayerNorm-2           [[1, 224, 1024]]                   [1, 224, 1024]                2,048     
 RNNForward-2     [[1, 224, 1024], [1], None]      [[1, 224, 1024], [1, 1, 1024]]          0       
     GRU-3        [[1, 224, 1024], None, [1]]      [[1, 224, 1024], [1, 1, 1024]]      6,297,600   
  LayerNorm-3           [[1, 224, 1024]]                   [1, 224, 1024]                2,048     
 RNNForward-3     [[1, 224, 1024], [1], None]      [[1, 224, 1024], [1, 1, 1024]]          0       
     GRU-4        [[1, 224, 1024], None, [1]]      [[1, 224, 1024], [1, 1, 1024]]      6,297,600   
  LayerNorm-4           [[1, 224, 1024]]                   [1, 224, 1024]                2,048     
 RNNForward-4     [[1, 224, 1024], [1], None]      [[1, 224, 1024], [1, 1, 1024]]          0       
     GRU-5        [[1, 224, 1024], None, [1]]      [[1, 224, 1024], [1, 1, 1024]]      6,297,600   
  LayerNorm-5           [[1, 224, 1024]]                   [1, 224, 1024]                2,048     
 RNNForward-5     [[1, 224, 1024], [1], None]      [[1, 224, 1024], [1, 1, 1024]]          0       
  RNNStack-1    [[1, 224, 608], [1], None, None] [[1, 224, 1024], [5, 1, 1024], []]        0       
   Linear-1             [[1, 224, 1024]]                   [1, 224, 5100]              5,227,500   
=====================================================================================================
Total params: 35,457,356
Trainable params: 35,457,356
Non-trainable params: 0
-----------------------------------------------------------------------------------------------------
Input size (MB): 0.27
Forward/backward pass size (MB): 53.81
Params size (MB): 135.26
Estimated Total Size (MB): 189.35
-----------------------------------------------------------------------------------------------------

模型的输出数据的shape是这个: image

[2022-10-13 15:01:33.741537 INFO   ] eval_test:evaluate:147 - 成功加载模型:models/deepspeech2_fbank/best_model/model.pdparams
  0%|                                                                                                                                                                                                                                                                        | 0/127 [00:00<?, ?it/s][16, 47, 5100]
  1%|██                                                                                                                                                                                                                                                              | 1/127 [00:00<01:24,  1.49it/s][16, 49, 5100]
[16, 50, 5100]
[16, 52, 5100]
[16, 52, 5100]
  4%|██████████                                                                                                                                                                                                                                                      | 5/127 [00:00<00:15,  7.87it/s][16, 53, 5100]
[16, 54, 5100]
[16, 54, 5100]
[16, 55, 5100]
  7%|██████████████████▏                                                                                                                                                                                                                                             | 9/127 [00:00<00:09, 12.09it/s][16, 56, 5100]
[16, 56, 5100]
[16, 57, 5100]
  9%|████████████████████████                                                                                                                                                                                                                                       | 12/127 [00:01<00:08, 14.24it/s][16, 58, 5100]
[16, 58, 5100]
[16, 59, 5100]

感觉这个结果不太对。训练过程没报错啊,正常

yeyupiaoling commented 1 year ago

哪里不对?

yeyupiaoling commented 1 year ago

eval 你的batch size设置小一些看看

Tian14267 commented 1 year ago

@yeyupiaoling batch_size已经降到了16,我的GPU是24G的。训练都是batch==48都没问题。

# 解码获取识别结果
out_strings = self.decoder_result(outs.numpy(), out_lens, test_dataset.vocab_list)

eval在这里会报错上面那个错了

yeyupiaoling commented 1 year ago

你用贪心解码正常吗?

Tian14267 commented 1 year ago

@yeyupiaoling 贪心解码正常。ctc_beam_search 就不行了。是这里要做升级吗?但是我训练过程没报错哎

yeyupiaoling commented 1 year ago

有可能是内存不够,ctc_beam_search有语言模型,这个比较大,

Tian14267 commented 1 year ago

有可能是内存不够,ctc_beam_search有语言模型,这个比较大,

好的,回头我调整下语言模型和内存,再试试。谢谢大神

YexfSC commented 1 year ago

大神,有个问题请教下: 执行以下代码 start = time.time() score, text = predictor.predict(audio_path='dataset/test.wav') print("消耗时间:%dms, 识别结果: %s, 得分: %d" % (round((time.time() - start) * 1000), text, score)) 出错 出错内容:

IndexError Traceback (most recent call last) /tmp/ipykernel_306/3857842264.py in 2 start = time.time() 3 # score, text = predictor.predict(audio_path='test.wav', to_an=False) ----> 4 score, text = predictor.predict(audio_path='dataset/test.wav') 5 print("消耗时间:%dms, 识别结果: %s, 得分: %d" % (round((time.time() - start) * 1000), text, score))

~/PPASR-master/ppasr/predict.py in predict(self, audio_path, audio_bytes, audio_ndarray, use_pun, is_itn) 199 output_data = output_handle.copy_to_cpu()[0] 200 # 解码 --> 201 score, text = self.decode(output_data=output_data, use_pun=use_pun, is_itn=is_itn) 202 return score, text 203

~/PPASR-master/ppasr/predict.py in decode(self, output_data, use_pun, is_itn) 135 else: 136 # 贪心解码策略 --> 137 result = greedy_decoder(probs_seq=output_data, vocabulary=self._text_featurizer.vocab_list) 138 139 score, text = result[0], result[1]

~/PPASR-master/ppasr/decoders/ctc_greedy_decoder.py in greedy_decoder(probs_seq, vocabulary, blank_index) 25 index_list = [index for index in index_list if index != blank_index] 26 # 索引列表转换为字符串 ---> 27 text = ''.join([vocabulary[index] for index in index_list]) 28 score = 0 29 if len(max_prob_list) > 0:

~/PPASR-master/ppasr/decoders/ctc_greedy_decoder.py in (.0) 25 index_list = [index for index in index_list if index != blank_index] 26 # 索引列表转换为字符串 ---> 27 text = ''.join([vocabulary[index] for index in index_list]) 28 score = 0 29 if len(max_prob_list) > 0:

IndexError: list index out of range 数组越界,请问这次配置参数的问题吗?

yeyupiaoling commented 1 year ago

@YexfSC 应该是你用了其他的词汇表,vocabulary.txt 这个文件。要跟你的模型配套使用的

YexfSC commented 1 year ago

@YexfSC 应该是你用了其他的词汇表,vocabulary.txt 这个文件。要跟你的模型配套使用的

谢谢大佬指点,文件替换了下,跑通了

YexfSC commented 1 year ago

@YexfSC 应该是你用了其他的词汇表,vocabulary.txt 这个文件。要跟你的模型配套使用的

大神,还想请教下如果要启用ctc_beam_search,最低的配置大概是怎么样的,现在只要切换ctc_beam_search,内核就直接重启了,用paddle平台的A100 40G 也无法跑成功

yeyupiaoling commented 1 year ago

16G内存就可以了。是内存不是显存。我看你机器配置应该挺高的。估计不是资源不足的问题。

YexfSC commented 1 year ago

16G内存就可以了。是内存不是显存。我看你机器配置应该挺高的。估计不是资源不足的问题。

嗯嗯,谢谢哈,我再调试调试

YexfSC commented 1 year ago

大神您好,再次向您请教问题,在使用ctc_beam_search时,调试到swig_wrapper.py中,发现导入paddlespeech_ctcdecoders无法往下执行,直接报内核重启错误,请问这是什么原因?然后我看您在上一个版本用的是swig_decoders,这两者有什么不同吗?

C++ Traceback (most recent call last):

0 FlagRegister::SetDescription(std::string const&, FlagDescription const&) 1 std::pair<std::_Rb_tree_iterator<std::pair<std::string const, FlagDescription > >, bool> std::_Rb_tree<std::string, std::pair<std::string const, FlagDescription >, std::_Select1st<std::pair<std::string const, FlagDescription > >, std::less, std::allocator<std::pair<std::string const, FlagDescription > > >::_M_emplace_unique<std::pair<std::string, FlagDescription > >(std::pair<std::string, FlagDescription >&&)


Error Message Summary:

FatalError: Segmentation fault is detected by the operating system. [TimeInfo: Aborted at 1665641532 (unix time) try "date -d @1665641532" if you are using GNU date ] [SignalInfo: SIGSEGV (@0xcd) received by PID 219888 (TID 0x7efd66b1c740) from PID 205 ]

yeyupiaoling commented 1 year ago

你的是什么系统呢?

---原始邮件--- 发件人: @.> 发送时间: 2022年10月17日(周一) 晚上8:42 收件人: @.>; 抄送: @.**@.>; 主题: Re: [yeyupiaoling/PPASR] eval 测试的时候会报错 (Issue #110)

大神您好,再次向您请教问题,在使用ctc_beam_search时,调试到swig_wrapper.py中,发现导入paddlespeech_ctcdecoders无法往下执行,直接报内核重启错误,请问这是什么原因?然后我看您在上一个版本用的是swig_decoders,这两者有什么不同吗?

C++ Traceback (most recent call last):

0 FlagRegisterstd::string::SetDescription(std::string const&, FlagDescriptionstd::string const&) 1 std::pair<std::_Rb_tree_iterator<std::pair<std::string const, FlagDescriptionstd::string > >, bool> std::_Rb_tree<std::string, std::pair<std::string const, FlagDescriptionstd::string >, std::_Select1st<std::pair<std::string const, FlagDescriptionstd::string > >, std::lessstd::string, std::allocator<std::pair<std::string const, FlagDescriptionstd::string > > >::_M_emplace_unique<std::pair<std::string, FlagDescriptionstd::string > >(std::pair<std::string, FlagDescriptionstd::string >&&)

Error Message Summary:

FatalError: Segmentation fault is detected by the operating system. [TimeInfo: Aborted at 1665641532 (unix time) try "date -d @1665641532" if you are using GNU date ] [SignalInfo: SIGSEGV @.) received by PID 219888 (TID 0x7efd66b1c740) from PID 205 ***]

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

YexfSC commented 1 year ago

系统信息:Linux version 5.4.0-109-generic (buildd@ubuntu) (gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)) #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022

yeyupiaoling commented 1 year ago

我的ubuntu18:04是没有问题的。

YexfSC commented 1 year ago

我的ubuntu18:04是没有问题的。

好的,谢谢大神,我再试试

Tian14267 commented 1 year ago

我的ubuntu18:04是没有问题的。

好的,谢谢大神,我再试试

请问你有解决这个问题吗?我暂时还无法解决。

yeyupiaoling commented 1 year ago

@YexfSC @Tian14267 是我代码问题,今晚修复一下

yeyupiaoling commented 1 year ago

@YexfSC @Tian14267 修复了,是包冲突问题。

YexfSC commented 1 year ago

@YexfSC @Tian14267 修复了,是包冲突问题。

感谢大神!!!!