Closed tommylau-exe closed 4 years ago
Relevant (originally from #12):
Not sure how to encode strings into a format the model will accept after it has been saved and reloaded in a different environment. May have to find some way to serialize this as well, through the model or otherwise.
Keras has an experimental pre-processing layer called TextVectorization that may work here. The TextVectorization layer has some advantages that make it a great fit for this problem:
Of course it's not a perfect solution:
It's worth noting that the layer has been experimental since at least November of 2019, so it may be somewhat stable by now.
After some testing, the experimental TextVectorization layer from Keras is definitely serializable, and works exceptionally well in this project. One issue I ran into was a ValueError
being thrown when the model is read back in after serialization. Looks like it's already been solved here. Unfortunately though, this fix isn't available in the latest stable Tensorflow release. We may have to switch to one of the nightly builds to get this to work properly.
This enhancement will make it possible to see the ML model's output for any given string of text. This would be incredibly useful for sanity checks and future interactivity.