Open ZedZipDev opened 2 years ago
Do you mean language identification task? See if any of these works can be of any help.
You may want to check StanzaNLP language identification: https://stanfordnlp.github.io/stanza/langid.html
Thanks for these pointers. The task is also abbreviated as language ID and is still far from solved (see this COLING 2020 paper for an overview of challenges). As far as I am aware, there is a lack of gold standard multilingual web-domain datasets for this task.
I wonder if this https://paperswithcode.com/paper/a-reproduction-of-apple-s-bi-directional-lstm is the current state of the art. The performance is not good at all... It seems to be a LSTM, I guess a transformer like BERT or better: XLnet would reach higher accuracy?
Is there anything for language recognition? I.e. input: text , output: what is the text language