Open pcuenca opened 1 year ago
Added WhitespacePreTokenizer
here https://github.com/huggingface/swift-transformers/pull/20
Adding BertPreTokenizer
to support BERT-type encoders would be great.
Tokenizers/PreTokenizer.swift:61: Fatal error: Unsupported PreTokenizer type: BertPreTokenizer
Thanks @011235813, could you share the tokenizer you are trying to load?
@pcuenca
let tokenizer = try await AutoTokenizer.from(pretrained: "sentence-transformers/all-MiniLM-L6-v2")
For example gives an error:
Tokenizers/PreTokenizer.swift:66: Fatal error: Unsupported PreTokenizer type: BertPreTokenizer
@pcuenca
let tokenizer = try await AutoTokenizer.from(pretrained: "sentence-transformers/all-MiniLM-L6-v2")
For example gives an error:
Tokenizers/PreTokenizer.swift:66: Fatal error: Unsupported PreTokenizer type: BertPreTokenizer
should be fixed in https://github.com/huggingface/swift-transformers/pull/137
So far I've ported the components I needed to support the models I tested, but there are many more in
transformers
andtokenizers
. For example:In addition to checking the source code of the
tokenizers
library, I recommend taking a look at the JavaScript implementation by @xenova intransformers.js
– it's a single file and easy to follow!