huggingface / swift-transformers

Swift Package to implement a transformers-like API in Swift
Apache License 2.0
735 stars 81 forks source link

Tokenizers: additional Normalizers, PreTokenizers, PostProcessors #4

Open pcuenca opened 1 year ago

pcuenca commented 1 year ago

So far I've ported the components I needed to support the models I tested, but there are many more in transformers and tokenizers. For example:

In addition to checking the source code of the tokenizers library, I recommend taking a look at the JavaScript implementation by @xenova in transformers.js – it's a single file and easy to follow!

jkrukowski commented 1 year ago

Added WhitespacePreTokenizer here https://github.com/huggingface/swift-transformers/pull/20

011235813 commented 9 months ago

Adding BertPreTokenizer to support BERT-type encoders would be great.

Tokenizers/PreTokenizer.swift:61: Fatal error: Unsupported PreTokenizer type: BertPreTokenizer

pcuenca commented 9 months ago

Thanks @011235813, could you share the tokenizer you are trying to load?

ptsochantaris commented 3 months ago

@pcuenca

let tokenizer = try await AutoTokenizer.from(pretrained: "sentence-transformers/all-MiniLM-L6-v2")

For example gives an error:

Tokenizers/PreTokenizer.swift:66: Fatal error: Unsupported PreTokenizer type: BertPreTokenizer
jkrukowski commented 1 month ago

@pcuenca

let tokenizer = try await AutoTokenizer.from(pretrained: "sentence-transformers/all-MiniLM-L6-v2")

For example gives an error:

Tokenizers/PreTokenizer.swift:66: Fatal error: Unsupported PreTokenizer type: BertPreTokenizer

should be fixed in https://github.com/huggingface/swift-transformers/pull/137