0.22.0 Release - Githubissues

Added

Addition of new_with_tokenizer constructor for SentenceEmbeddingsModel allowing passing custom tokenizers for sentence embeddings pipelines.
Support for Tokenizers in pipelines, allowing loading tokenizer.json and special_token_map.json tokenizer files.
(BREAKING) Most model configuration can now take an optional kind parameter to specify the model weight precision. If not provided, will default to full precision on CPU, or the serialized weights precision otherwise.

(BREAKING) Fixed the keyword extraction pipeline for n-gram sizes > 2. Add new configuration option tokenizer_forbidden_ngram_chars to specify characters that should be excluded from n-grams (allows filtering m-grams spanning multiple sentences).
Improved MPS device compatibility setting the sparse_grad flag to false for gather operations
Updated ONNX runtime backend version to 1.15.x
Issue with incorrect results for QA models with a tokenizer not using segment ids
Issue with GPT-J that was incorrectly tracking the gradients for the attention bias

(BREAKING) Upgraded to torch 2.1 (via tch 0.14.0).
(BREAKING) Text generation traits and pipelines (including conversation, summarization and translation) now return a Result for improved error handling