microsoft / onnxruntime-extensions

onnxruntime-extensions: A specialized pre- and post- processing library for ONNX Runtime
MIT License
340 stars 91 forks source link

Add C++ regex support for Llama3, Standard Library, and Custom Cases #804

Closed sayanshaw24 closed 2 months ago

sayanshaw24 commented 2 months ago

Currently, our regular expression pattern matching for BPE tokenization only supports the GPT2 regex.

This PR adds the following functionality:

  1. Standard library regex support.
  2. Llama3 regex support.
  3. Custom regex implementation that will use appropriate regex given a model.