issues
search
xhluca
/
bm25s
Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy
https://bm25s.github.io
MIT License
915
stars
38
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Added support for saving and loading non ASCII chars in corpus and vocab
#86
IssacXid
opened
3 days ago
6
How to disable tqdm
#84
fortyfourforty
closed
6 days ago
0
Update README.md
#83
xhluca
closed
1 week ago
0
Add BM25_pt (my library) to the acknowledgement
#80
jxmorris12
closed
3 weeks ago
3
Fails to save & load dictionaries & corpora with extended character set
#79
LevMuchnik
opened
3 weeks ago
7
Fix regression with max call
#78
xhluca
closed
3 weeks ago
0
Create nltk_stemmer.py
#77
aflip
closed
3 weeks ago
2
Is it possible to incrementally add corpus to a retriever?
#74
ANYMS-A
closed
4 weeks ago
0
Fix issue where adding "" makes word and stem dicts out of sync
#73
xhluca
closed
3 weeks ago
1
Fix crash tokenizing with empty word_to_id
#72
mgraczyk
closed
1 month ago
0
Pseudo relevance feedback implementation
#70
roynirmal
closed
1 month ago
0
Get sparse embedding functionality
#68
lspataroG
closed
1 month ago
0
Make empty strings an acceptable token
#67
xhluca
closed
1 month ago
4
Add features to improve memory usage
#63
xhluca
closed
1 month ago
0
How can I obtain the doc ID?
#62
Ask-sola
closed
1 month ago
0
IndexError when customized corpus has only one element
#61
hoangnv735
closed
1 month ago
2
Index out of bounds errors in 0.2.0 and 0.2.1
#60
hantusk
closed
1 month ago
9
Add saving and loading corpus/stopwords to `Tokenizer` and add integration to HF Hub via `bm25s.hf.TokenizerHF` (save/load)
#59
xhluca
closed
2 months ago
0
Improve docs and names for bm25s.tokenization.Tokenizer
#56
xhluca
closed
2 months ago
0
ImportError: cannot import name 'Tokenizer' from 'bm25s.tokenization' (/usr/local/lib/python3.10/dist-packages/bm25s/tokenization.py)
#55
hanxu49
closed
2 months ago
3
Add tests for BM25.retrieve in different scenario (tokenized, ids/vocab tuple, object with ids and vocab attributes, ids, strings)
#52
xhluca
opened
2 months ago
0
Improve tokenizer
#51
xhluca
closed
2 months ago
0
Add weight mask that are applied to scores during retrieval
#50
xhluca
closed
2 months ago
2
Replace ujson with orjson, add load and close for `jsonlcorpus`
#49
xhluca
closed
2 months ago
0
Refactor retrieval to make it faster to run in numba mode
#47
xhluca
closed
2 months ago
4
Refactor tests to be ran in different jobs
#45
xhluca
closed
3 months ago
0
Add type hint for `texts` argument in `tokenize` function, use `time.monotonic` instead of `time.time`
#44
dantetemplar
closed
3 months ago
1
Use `time.monotonic` instead of `time.time`
#43
dantetemplar
closed
3 months ago
1
Maybe use `time.monotonic` instead of `time.time`?
#42
dantetemplar
closed
3 months ago
1
Add numba integration to allow for faster scoring and retrieval
#41
xhluca
closed
3 months ago
0
[feature request] Implement BMX algorithm
#40
logan-markewich
opened
3 months ago
6
Consider orjson as faster and more robust alternative to ujson
#39
xhluca
closed
2 months ago
1
Thread safe search
#37
okhat
closed
3 months ago
3
[Feature request] Document metadata and filtering
#35
dl423
closed
2 months ago
3
How to apply bm25s to languages such as Chinese?
#34
AlanLu0808
closed
3 months ago
2
Add stopwords for 10 new languages
#33
bm777
closed
3 months ago
14
Other language than english for the stopwords list
#32
bm777
closed
3 months ago
2
On-the-fly stemming
#31
xhluca
closed
2 months ago
1
Bug fix and add link
#28
xhluca
closed
4 months ago
0
🚨Before submitting an issue, read this 🚨
#27
xhluca
closed
4 months ago
0
Update dev-0.1 branch
#24
xhluca
closed
4 months ago
0
Update branch
#23
xhluca
closed
4 months ago
0
可以增量更新索引吗?
#22
bojone
closed
4 months ago
0
Can you query without a tokenization step?
#21
snewcomer
closed
4 months ago
0
how to dynamic add/delete documents
#19
luoyangen
closed
4 months ago
0
[Feature Request] Support attaching metadata to the corpus
#18
logan-markewich
closed
4 months ago
3
Not Working for langchain Documents
#16
pradhandebasish2046
closed
4 months ago
0
Minor bug: `show_progress` not propagated in `BM25.index`
#15
ValeKnappich
closed
5 months ago
1
Pre-computed TF-IDF
#9
celsofranssa
closed
5 months ago
0
Capability Inquiry: Retrieving Specific JSON Records Based on Text
#8
RakshitKhajuria
closed
5 months ago
4
Next