openspeech-team / openspeech

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.
https://openspeech-team.github.io/openspeech/
MIT License
670 stars 112 forks source link

Error in generate lable #206

Closed hyungraelee closed 1 year ago

hyungraelee commented 1 year ago

Bug

https://github.com/openspeech-team/openspeech/blob/1b8101b667b0a5018dc1542c106115aad8eacb30/openspeech/datasets/ksponspeech/preprocess/grapheme.py#L87C9-L87C19

안녕하세요, 먼저 좋은 프로젝트를 공유해주셔서 감사합니다 :) grapheme tokenizer의 학습 데이터 레이블 생성 중 에러가 있어 공유드립니다. 위 링크의 코드에서 idx + 4 로 시작을 해야 올바른 학습 데이터가 생성됩니다.

Hello, first of all, thank you for sharing a good project :) I'm sharing this because there was an error creating the training data label of grapheme tokenizer. In the code above link, you must start with idx + 4 to generate the correct training data.

upskyy commented 1 year ago

@hyungraelee I think the bug is correct. Thank you for reporting. If you don't mind, you can write it as a PR.