kampersanda / xcdat

Fast compressed trie dictionary library
https://kampersanda.github.io/xcdat/
MIT License
64 stars 12 forks source link

Is it possible to specify the ID encoding or encode by lexicographical order? #3

Open jiguanglizipao opened 1 year ago

jiguanglizipao commented 1 year ago

In Sample usage, xcdat produces a string-to-ID encoding which seems to be random and not in lexicographical order. Is it possible to specify string IDs or make them ranked in lexicographical order? If not, what is the strategy/order for generating the encoding?

kampersanda commented 1 year ago

@jiguanglizipao Sorry for the late reply.

Is it possible to specify string IDs or make them ranked in lexicographical order?

No. String IDs must be in random order due to the data structure. If you want to obtain lex order mapping, you need to construct permutation outside Xcdat.

If not, what is the strategy/order for generating the encoding?

This is because Xcdat (almost randomly) arranges trie nodes in an array based on the double-array scheme and assigns string IDs based on the arrangement.