Open fsx950223 opened 2 years ago
In order to use tflite model, you have to convert strings to token ids, such as 'He'-> 1332. @yoeo
Hi @fsx950223
Why do you use chars as inputs instead of words?
In fact, I tested both chars and words with various preprocessing tricks and chose the one that gave the best predictions with the current model & training dataset. If one day I switch to a new machine learning model or change the way I build the training dataset, I'll have to test the different preprocessing options again and choose the best one -> and it could be "words" this time.
By the way, if you know any general rule about when to use chars or words for feature engineering, I'll be happy to learn and test it :slightly_smiling_face:
In order to use tflite model, you have to convert strings to token ids, such as 'He'-> 1332.
In theory yes. You probably could use tflite by:
string -> integer
mappings from the modelI don't know if it will actually work, but if you find a way to make work, please share the details here https://github.com/yoeo/guesslang/issues/26
For improving model performance, I recommend tf.keras.layers.TextVectorization
+ FastText model which is similar to the current model. For more details, taking a look at https://www.tensorflow.org/text/guide/word_embeddings
I have a question about feature engineering. Why do you use chars as inputs instead of words? For example,
is better than
?