sooftware / kospeech

Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
https://sooftware.github.io/kospeech/
Apache License 2.0
605 stars 192 forks source link

eval.py 진행 시 lm_path error #24

Closed ghost closed 4 years ago

ghost commented 4 years ago

안녕하세요. 우선 이렇게 좋은 코드와 모델을 만들어주셔서 감사합니다!! 코드도 너무 깔끔해서 많이 배우고 있습니다!

aihub 데이터 그대로 다운받아서 toy data만 train, evaluation해보고 있는데 eval.py에서 아래와 같은 에러가 나옵니다ㅠ 진행시에는 colab 환경에서 했고 path와 batch size만 변경하였습니다. 어떤게 문제인지 알 수 있을까요?ㅠ

[2020-06-19 03:36:36,586 utils.py:21 - info()] --mode: eval
[2020-06-19 03:36:36,586 utils.py:21 - info()] --sample_rate: 16000
[2020-06-19 03:36:36,587 utils.py:21 - info()] --window_size: 20
[2020-06-19 03:36:36,587 utils.py:21 - info()] --stride: 10
[2020-06-19 03:36:36,587 utils.py:21 - info()] --n_mels: 80
[2020-06-19 03:36:36,587 utils.py:21 - info()] --normalize: True
[2020-06-19 03:36:36,587 utils.py:21 - info()] --del_silence: True
[2020-06-19 03:36:36,587 utils.py:21 - info()] --input_reverse: True
[2020-06-19 03:36:36,587 utils.py:21 - info()] --feature_extract_by: librosa
[2020-06-19 03:36:36,587 utils.py:21 - info()] --time_mask_para: 50
[2020-06-19 03:36:36,587 utils.py:21 - info()] --freq_mask_para: 12
[2020-06-19 03:36:36,587 utils.py:21 - info()] --time_mask_num: 2
[2020-06-19 03:36:36,588 utils.py:21 - info()] --freq_mask_num: 2
[2020-06-19 03:36:36,588 utils.py:21 - info()] --dataset_path: ../../DATA/KsponSpeech_01/KsponSpeech_0001/
[2020-06-19 03:36:36,588 utils.py:21 - info()] --data_list_path: ../data/data_list/toy_test_list.csv
[2020-06-19 03:36:36,588 utils.py:21 - info()] --label_path: ./data/label/aihub_labels.csv
[2020-06-19 03:36:36,588 utils.py:21 - info()] --num_workers: 4
[2020-06-19 03:36:36,588 utils.py:21 - info()] --use_cuda: True
[2020-06-19 03:36:36,588 utils.py:21 - info()] --model_path: ../data/checkpoint/checkpoints/2020_06_18_08_42_49/model.pt
[2020-06-19 03:36:36,588 utils.py:21 - info()] --batch_size: 8
[2020-06-19 03:36:36,588 utils.py:21 - info()] --decode: greedy
[2020-06-19 03:36:36,588 utils.py:21 - info()] --k: 5
[2020-06-19 03:36:36,588 utils.py:21 - info()] --print_every: 10
[2020-06-19 03:36:36,638 utils.py:21 - info()] Operating System : Linux 4.19.104+
[2020-06-19 03:36:36,639 utils.py:21 - info()] Processor : x86_64
[2020-06-19 03:36:36,644 utils.py:21 - info()] device : Tesla K80
[2020-06-19 03:36:36,644 utils.py:21 - info()] CUDA is available : True
[2020-06-19 03:36:36,644 utils.py:21 - info()] CUDA version : 10.1
[2020-06-19 03:36:36,644 utils.py:21 - info()] PyTorch version : 1.5.0+cu101
[2020-06-19 03:36:56,738 utils.py:141 - _init_num_threads()] NumExpr defaulting to 2 threads.
100% 167/167 [03:21<00:00,  1.21s/it]
Traceback (most recent call last):
  File "./eval.py", line 66, in <module>
    main()
  File "./eval.py", line 62, in main
    inference(opt)
  File "./eval.py", line 41, in inference
    evaluator = Evaluator(testset, opt.batch_size, device, opt.num_workers, opt.print_every, opt.decode, opt.k)
  File "../kospeech/evaluator/evaluator.py", line 28, in __init__
    self.decoder = GreedySearch()
  File "../kospeech/decode/search.py", line 21, in __init__
    self.language_model = load_language_model('lm_path', 'cuda')
  File "../kospeech/model_builder.py", line 130, in load_language_model
    model = torch.load(path, map_location=lambda storage, loc: storage).to(device)
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 584, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 234, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 215, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'lm_path'
sooftware commented 4 years ago

안녕하세요.
저희 코드가 도움이 됐다니 좋네요 ㅎㅎ

image

search.py의 코드를 위와 같이 수정해주실래요??
언어모델이 있는 경우에 동작하도록 만든 코드인데, 언어 모델이 없는 경우는
None으로 설정해주시면 됩니다.

최근 코드를 조금 수정해서 자잘한 에러들이 있을 수 있습니다 ㅠ.ㅠ
이슈 남겨주시면 최대한 빠르게 답 드리겠습니다.

감사합니다.

ghost commented 4 years ago

빠른 답변 감사드립니다. 에러없이 잘 해결되었습니다 👍

추가로 제공해주신 파일 중 학습된 model을 가지고 돌려보려하는데 data/checkpoints/model.pt가 맞나요?

맞다면 돌렸을 때 e2e module이 없다고 나와서요 ㅠㅠ

[2020-06-19 05:46:57,273 utils.py:21 - info()] --mode: eval
[2020-06-19 05:46:57,273 utils.py:21 - info()] --sample_rate: 16000
[2020-06-19 05:46:57,273 utils.py:21 - info()] --window_size: 20
[2020-06-19 05:46:57,273 utils.py:21 - info()] --stride: 10
[2020-06-19 05:46:57,273 utils.py:21 - info()] --n_mels: 80
[2020-06-19 05:46:57,273 utils.py:21 - info()] --normalize: True
[2020-06-19 05:46:57,273 utils.py:21 - info()] --del_silence: True
[2020-06-19 05:46:57,273 utils.py:21 - info()] --input_reverse: True
[2020-06-19 05:46:57,273 utils.py:21 - info()] --feature_extract_by: librosa
[2020-06-19 05:46:57,273 utils.py:21 - info()] --time_mask_para: 50
[2020-06-19 05:46:57,274 utils.py:21 - info()] --freq_mask_para: 12
[2020-06-19 05:46:57,274 utils.py:21 - info()] --time_mask_num: 2
[2020-06-19 05:46:57,274 utils.py:21 - info()] --freq_mask_num: 2
[2020-06-19 05:46:57,274 utils.py:21 - info()] --dataset_path: ../../DATA/KsponSpeech_01/KsponSpeech_0001/
[2020-06-19 05:46:57,274 utils.py:21 - info()] --data_list_path: ../data/data_list/toy_test_list.csv
[2020-06-19 05:46:57,274 utils.py:21 - info()] --label_path: ./data/label/aihub_labels.csv
[2020-06-19 05:46:57,274 utils.py:21 - info()] --num_workers: 4
[2020-06-19 05:46:57,274 utils.py:21 - info()] --use_cuda: True
[2020-06-19 05:46:57,274 utils.py:21 - info()] --model_path: ../data/checkpoints/model.pt
[2020-06-19 05:46:57,274 utils.py:21 - info()] --batch_size: 8
[2020-06-19 05:46:57,274 utils.py:21 - info()] --decode: greedy
[2020-06-19 05:46:57,274 utils.py:21 - info()] --k: 5
[2020-06-19 05:46:57,274 utils.py:21 - info()] --print_every: 10
[2020-06-19 05:46:57,289 utils.py:21 - info()] Operating System : Linux 4.19.104+
[2020-06-19 05:46:57,289 utils.py:21 - info()] Processor : x86_64
[2020-06-19 05:46:57,291 utils.py:21 - info()] device : Tesla P100-PCIE-16GB
[2020-06-19 05:46:57,291 utils.py:21 - info()] CUDA is available : True
[2020-06-19 05:46:57,291 utils.py:21 - info()] CUDA version : 10.1
[2020-06-19 05:46:57,291 utils.py:21 - info()] PyTorch version : 1.5.0+cu101
Traceback (most recent call last):
  File "./eval.py", line 66, in <module>
    main()
  File "./eval.py", line 62, in main
    inference(opt)
  File "./eval.py", line 25, in inference
    model = load_test_model(opt, device)
  File "../kospeech/model_builder.py", line 116, in load_test_model
    model = torch.load(opt.model_path, map_location=lambda storage, loc: storage).to(device)
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 593, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 773, in _legacy_load
    result = unpickler.load()
ModuleNotFoundError: No module named 'e2e'
sooftware commented 4 years ago

해당 파일은 예전에 학습시킨 모델인것 같습니다.
gitignore에 포함이 안 되어서 업로드 된 것 같습니다.
현재 학습시킨 웨이트 파일이 필요하시면 메일남겨주세요.

혹은 www.kospeech.com 에 저희 모델로 추론이 가능하도록 웹 어플리케이션을 만들어놨습니다.
한번 이용해보셔도 괜찮을 것 같네요.

감사합니다.

ghost commented 4 years ago

ur.luella@gmail.com입니다! 웹 어플리케이션도 사용해보겠습니다~

친절한 답변 감사합니다 :) 즐거운 주말 되세요!!