Open danpovey opened 7 years ago
I'm interested, but I have to admit that I am hard-pressed for time right now. I'll see if I can fool with it this weekend.
Some comments:
If someone else wants to do this, definitely don't feel turned away just because I am provisionally interested in it.
RE why kaldi_lm is used-- I think maybe because we were pruning and it does a better job of pruning than the other ones; also the perplexity is very slightly better than the other ones even for unpruned language models, and the license is better than SRILM. But it is far from ideal in terms of documentation. I don't think there is one LM toolkit that we favor across the board, as they all have drawbacks. I'm not so concerned about that aspect of it, as it's very separable from the other issues, it doesn't really interact with anything. What bothers me more is the structure of the scripts.
In case anyone's interested in trying this out, I haven't done any work on this yet.
@danpovey, maybe you could mention what would be the subject of cleanup -- only local/ and conf/? Because changing/restructuring steps/ and utils/ would be a major change as it could affect other recipes. y.
On Tue, Sep 5, 2017 at 4:03 PM, Daniel Galvez notifications@github.com wrote:
In case anyone's interested in trying this out, I haven't done any work on this yet.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1846#issuecomment-327287283, or mute the thread https://github.com/notifications/unsubscribe-auth/AKisX7z4bn-bVo9ICthz0OEPIEzlmCUKks5sfakhgaJpZM4PE_Od .
Definitely steps/ and utils/ would not be changed; these would be linked to the current location.
On Tue, Sep 5, 2017 at 1:47 PM, jtrmal notifications@github.com wrote:
@danpovey, maybe you could mention what would be the subject of cleanup -- only local/ and conf/? Because changing/restructuring steps/ and utils/ would be a major change as it could affect other recipes. y.
On Tue, Sep 5, 2017 at 4:03 PM, Daniel Galvez notifications@github.com wrote:
In case anyone's interested in trying this out, I haven't done any work on this yet.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1846#issuecomment-327287283, or mute the thread https://github.com/notifications/unsubscribe-auth/AKisX7z4bn- bVo9ICthz0OEPIEzlmCUKks5sfakhgaJpZM4PE_Od .
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1846#issuecomment-327298558, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu0DLkPpznF0OUpibxOcCj0yQUo-Nks5sfbN3gaJpZM4PE_Od .
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed by a bot strictly because of inactivity. This does not mean that we think that this issue is not important! If you believe it has been closed hastily, add a comment to the issue and mention @kkm000, and I'll gladly reopen it.
[I hit send too soon on this; I'm updating the comment.]
I think the time might have come to create an 's5b' version of the WSJ setup. WSJ is the oldest setup and the local scripts are not up to the standard of clarity that we usually expect. Some specific issues:
<unk>
(lower-case) instead of<SPOKEN_NOISE>
, which is more standard, and make the phone names lower-case instead of upper-case.Part of my motivation is that we'll be doing some RNNLM stuff with this setup (since we have example scripts for older setups) and the scripts need to be cleaner. I don't know if anyone has the time and inclination to work on this?