Open sehHeiden opened 1 year ago
Hi @sehHeiden this is an interesting question. The problem is the tokenization. The process is a bit more complex than splitting the words. Longer and compound words get split up into individual tokens, it works a bit like a simple compression algorithm. The huggingface team has a library for all the different tokenizer. To make it work, you would need to implement the BertTokenizer in Elixir or build a wrapper for the compiled Rust tokenizers from this lib.
Or you use a tool to run the original python code in Elixir, something like this.
Could I add some ONNX export version?
My current attempt is:
Than I used the model in Elixir:
But I still have some problems/questions:
about 4) I currently scale the prediction as follows:
about 5) In my version above keys that not matched return nil. I changed that to 0. But that changes to meaning of the sentence.
I opened a question in the Elixir Forum about it here.