Closed wizardk closed 4 years ago
@ramonsanabria Hi, but let AM to learn where is the word segmentation is really reasonable? Sometimes this segmentation is not obvious. We are already using CTC, so the AM should not care for this and let LM to handle it.
Thanks for your reply and I'm really glad to discuss this issue with you.
Hi, those are just design choices. I has been shown that CTC can actually model spaces. Please see:
https://arxiv.org/abs/1708.04469 https://arxiv.org/abs/1712.06855 https://ieeexplore.ieee.org/abstract/document/8639530
On Thu, Oct 31, 2019 at 10:32 AM wizardk notifications@github.com wrote:
@ramonsanabria https://github.com/ramonsanabria Hi, but let AM to learn where is the word segmentation is really reasonable? Sometimes this segmentation is not obvious. We are already using CTC, so the AM should not care for this and let LM to handle it.
Thanks for your reply and I'm really glad to discuss this issue with you.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen/issues/218?email_source=notifications&email_token=ADEXAPK2RF2CZHES74ASRTLQRKX23A5CNFSM4JHFPWE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECXH5FY#issuecomment-548306583, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADEXAPN4W2L7YADHB6WSXK3QRKX23ANCNFSM4JHFPWEQ .
@ramonsanabria Hi, good works in your papers! About space in AM, do you mean this part in 1708.04469?
Word boundaries can be modeled with a space symbol or by capitalizing the first letter of each word [11]. While decoding CTC acoustic models without adding external linguistic information works well, a vast amount of training data should be used to get competitive results [12].
Actually, this means the AM learned some linguistic information and embedded a weak LM in it from the labeled text. It's not a good ideal if we need to switch application domain by using corresponding LM. And it IS need a vast amount of training data in the meantime.
Thanks for your information again.
I only use this repository to build TLG and decode CTC outputs. I still wonder whether I can abandon these symbols in AM by using this repository? Do you have any ideal?
@ramonsanabria I got it. I can train AM in char mode and build TLG in phone mode. In this way, I can discard space, unk, silence and so on.
correct yes.
On Fri, Nov 1, 2019 at 9:31 AM wizardk notifications@github.com wrote:
@ramonsanabria https://github.com/ramonsanabria I got it. I can train AM in char mode and build TLG in phone mode. In this way, I can discard space, unk, silence and so on.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen/issues/218?email_source=notifications&email_token=ADEXAPLYKTPASSTZIWVFJKLQRPZORA5CNFSM4JHFPWE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEC2NYPA#issuecomment-548723772, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADEXAPJTWF3AYQHZOHIRSHTQRPZORANCNFSM4JHFPWEQ .
to know where a word ends.
On Thu, Oct 31, 2019 at 7:18 AM wizardk notifications@github.com wrote: