srvk / eesen

The official repository of the Eesen project
http://arxiv.org/abs/1507.08240
Apache License 2.0
824 stars 342 forks source link

Why do we need space and unk symbols in the char mode for acoustic model? #218

Closed wizardk closed 4 years ago

ramonsanabria commented 4 years ago

to know where a word ends.

On Thu, Oct 31, 2019 at 7:18 AM wizardk notifications@github.com wrote:

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen/issues/218?email_source=notifications&email_token=ADEXAPI7GSEYJ7SNMTYHRODQRKBDNA5CNFSM4JHFPWE2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HVVTJTQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADEXAPO5DTFKNW2NWKJPDB3QRKBDNANCNFSM4JHFPWEQ .

wizardk commented 4 years ago

@ramonsanabria Hi, but let AM to learn where is the word segmentation is really reasonable? Sometimes this segmentation is not obvious. We are already using CTC, so the AM should not care for this and let LM to handle it.

Thanks for your reply and I'm really glad to discuss this issue with you.

ramonsanabria commented 4 years ago

Hi, those are just design choices. I has been shown that CTC can actually model spaces. Please see:

https://arxiv.org/abs/1708.04469 https://arxiv.org/abs/1712.06855 https://ieeexplore.ieee.org/abstract/document/8639530

On Thu, Oct 31, 2019 at 10:32 AM wizardk notifications@github.com wrote:

@ramonsanabria https://github.com/ramonsanabria Hi, but let AM to learn where is the word segmentation is really reasonable? Sometimes this segmentation is not obvious. We are already using CTC, so the AM should not care for this and let LM to handle it.

Thanks for your reply and I'm really glad to discuss this issue with you.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen/issues/218?email_source=notifications&email_token=ADEXAPK2RF2CZHES74ASRTLQRKX23A5CNFSM4JHFPWE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECXH5FY#issuecomment-548306583, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADEXAPN4W2L7YADHB6WSXK3QRKX23ANCNFSM4JHFPWEQ .

wizardk commented 4 years ago

@ramonsanabria Hi, good works in your papers! About space in AM, do you mean this part in 1708.04469?

Word boundaries can be modeled with a space symbol or by capitalizing the first letter of each word [11]. While decoding CTC acoustic models without adding external linguistic information works well, a vast amount of training data should be used to get competitive results [12].

Actually, this means the AM learned some linguistic information and embedded a weak LM in it from the labeled text. It's not a good ideal if we need to switch application domain by using corresponding LM. And it IS need a vast amount of training data in the meantime.

Thanks for your information again.

I only use this repository to build TLG and decode CTC outputs. I still wonder whether I can abandon these symbols in AM by using this repository? Do you have any ideal?

wizardk commented 4 years ago

@ramonsanabria I got it. I can train AM in char mode and build TLG in phone mode. In this way, I can discard space, unk, silence and so on.

ramonsanabria commented 4 years ago

correct yes.

On Fri, Nov 1, 2019 at 9:31 AM wizardk notifications@github.com wrote:

@ramonsanabria https://github.com/ramonsanabria I got it. I can train AM in char mode and build TLG in phone mode. In this way, I can discard space, unk, silence and so on.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen/issues/218?email_source=notifications&email_token=ADEXAPLYKTPASSTZIWVFJKLQRPZORA5CNFSM4JHFPWE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEC2NYPA#issuecomment-548723772, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADEXAPJTWF3AYQHZOHIRSHTQRPZORANCNFSM4JHFPWEQ .