transcranial / keras-js

Run Keras models in the browser, with GPU support using WebGL
https://transcranial.github.io/keras-js
MIT License
4.96k stars 503 forks source link

How to tokenize text and prepare data for " model.predict() " ? #126

Open TravJav opened 6 years ago

TravJav commented 6 years ago

I have been looking through the documentation how to implement the prediction model in KerasJS but have been unsuccessful finding the correct context. I understand there is a boilerplate code available in js which just states that a float32 is used along with the callback and model.predict etc, I was able to use the encoder to convert the model h5 to the *bin required but in terms of tokenizing the text I do not see anything currently that is in the documents with KerasJS.

i've been successful using Python and the normal approach with Keras but am struggling to find concrete examples to do what is required for my project with KerasJS.

I have done the following with the Python for classification:

_1. Create Tokenizer

  1. Open a dictionary *json file containing the top 3000 words in
  2. Implement keras.text_to_word_sequence (myText)
  3. Make sure the words are registered in the dictionary
  4. Tokenizer.sequences_to_matrix(input), binary
  5. predict and receive the appropriate output_

Can someone kindly explain to me or point me to the correct docs that I can follow to achieve this?

zhuoli7 commented 6 years ago

According to my understanding of keras.js documentation, I think they don't support Tokenizer. My solution is finishing all the preprocessing work in a flask app and send the ready-to-use sequence back for prediction. Although it's been a while, I hope this can help.

hmhwe commented 5 years ago

What about the predicted output? How can we convert the predicted sequence of integers back to text? @zhuoli7