gunawanlg / voice-to-text-bahasa

Apache License 2.0
0 stars 0 forks source link

Language Model #58

Open gunawanlg opened 4 years ago

gunawanlg commented 4 years ago

Some options to do:

  1. N-gram model (ref: Stanford)
  2. Probabilistic Context Free Grammar (ref: Columbia Edu)
  3. Neural Nets (throw it all and expect good results :sweat_smile:)

Some already established LM in Bahasa:

  1. Fasttext (Facebook)
  2. BERT (google) this is Word Piece Model to the loop.
  3. ULMFit Github Cahya Wirawan (edit to add more....)

Will do after roughly 50% WER? Wdty @hariesramdhani?

Maybe do first using SVD? :stuck_out_tongue_winking_eye:

hariesramdhani commented 4 years ago

I've been eyeing on this for a while now Word Beam Search

gunawanlg commented 4 years ago

What corpus should we use for training this language model?