Closed brunabazaluk closed 2 months ago
Would it be ok to add to this PR an update at the tutorial giving an example using the function to create a DFA? It would be something like this:
dfa_graph = {
"edges": [
(0, 1, ctrlg.populate_edge(["A", "B"], vocab_size, tokenizer)),
(1, 0, ctrlg.populate_edge(["+","-","*","/"], vocab_size, tokenizer)),
(1, 2, ctrlg.populate_edge(["="], vocab_size, tokenizer)),
(2, 2, ctrlg.populate_edge(vocab_size=vocab_size, ALL=True)),
],
"initial_state": 0,
"accept_states": set([2]),
}
Since different tokenizers can have very different behaviors, and some words are tokenized as multiple tokens (however each edge in the DFA should consists a list of single tokens), this function is probably not suitable for most applications. I will include a small example for custom DFAs in the README. Thanks.
This simple function receives a list of words accepted by an edge and returns the bitset that represents the corresponding tokens.