Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy
920
stars
39
forks
source link
Add saving and loading corpus/stopwords to `Tokenizer` and add integration to HF Hub via `bm25s.hf.TokenizerHF` (save/load) #59
Closed
xhluca closed 2 months ago
Tokenizer.save_vocab
andTokenizer.load_vocab
methods to save/load vocabulary to a json file calledvocab.tokenizer.json
by defaultTokenizer.save_stopwords
andTokenizer.load_stopwords
methods to save/load stopwords to a json file calledstopwords.tokenizer.json
by defaultTokenizerHF
class to allow saving/loading from huggingface hubload_vocab_from_hub
,save_vocab_to_hub
,load_stopwords_from_hub
,save_stopwords_to_hub