It looks like there are a lot of changes but that is the Elixir 1.8 formatter. The actual differences are:
change char_list to charlist
change to_char_list to to_charlist
add :unicode to the :re.compile() call to handle unicode characters.
iolist_size fails on a list like [8220] which you get if curly quotes are included in the text to parse so I added a fast_length method to interpreter.ex that handles 0, 1, 2, 3 length charlists and falls back to :erlang.length() for longer lists. (It works, but feels like a hack.)
Feel free to take or reject these changes. It was motivated by trying to write a tokenizer like the classic tokenizer in lucene. You can see what I was trying here: https://github.com/baldmountain/lucille I needed the changes to get my project to successfully parse the text version of Tom Sawyer downloaded from Project Gutenberg. ABNF is probably too slow for this since is interpreted. I may try something else. Just wanted to pass on my changes in case they are useful.
It looks like there are a lot of changes but that is the Elixir 1.8 formatter. The actual differences are:
:unicode
to the :re.compile() call to handle unicode characters.Feel free to take or reject these changes. It was motivated by trying to write a tokenizer like the classic tokenizer in lucene. You can see what I was trying here: https://github.com/baldmountain/lucille I needed the changes to get my project to successfully parse the text version of Tom Sawyer downloaded from Project Gutenberg. ABNF is probably too slow for this since is interpreted. I may try something else. Just wanted to pass on my changes in case they are useful.