Open brandondutra opened 7 years ago
Yes, agree.
I haven't fully grokked how vocab files work end-to-end ... wrt to setting up a hashtable from a file, so it works at training and prediction time, and how vocabs should be saved within a saved model. Perhaps this can be researched a bit unless you already know...
The structure data package reads the vocab file, and embeds it in the graph with index_table_from_tensor (but I think index_to_string_table_from_file would work fine). The vocab file then does not need to be saved with the exported graph.
Having one file with all the vocabs can be a problem for large examples. I think this was a performance problem with a criteo sample.
It would be nice to have vocab files for each column. So if a "string to int" transforms is needed only for a few categorical columns, the vocab for every column does not need to be loaded.