Closed kou closed 5 years ago
- full words as id list
[id1, id2, ...]
penn_treebank.table.dictionary_encode(:word).ids
- vocabrary (word list not duplicated and convertible from id)
penn_treebank.table.dictionary_encode(:word).values
penn_treebank.table.dictionary_encode(:word).value(id)
And we need the number of vocabulary.
penn_treebank.table.dictionary_encode(:word).size
I've added Table#label_encode
(like scikit-learn) and Table#dictionary_encode
(like Apache Arrow).
#label_encode
is based on #dictionary_encode
. #label_encode
will be useful for just converting values to IDs. If we need to re-convert IDs to values, #dictionary_encode
will be useful.
GitHub: fix #22
@youchan How about this approach?