Helper functions for tokenizing text

simonw / datasette-openai

SQL functions for calling OpenAI APIs

https://datasette.io/plugins/datasette-openai

Apache License 2.0

21 stars 3 forks source link

Closed simonw closed 1 year ago

simonw commented 1 year ago

I need this for:

May as well expose these as functions too.

simonw commented 1 year ago

re.compile(r"""'s|'t|'re|'ve|'m|'ll|'d| ?\p{L}+| ?\p{N}+| ?[^\s\p{L}\p{N}]+|\s+(?!\S)|\s+""")

That actually uses the regex module because of the \p sequences, so I'll have to add that as a dependency.