xhluca bm25s issues - Githubissues

xhluca / bm25s

Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy

https://bm25s.github.io

MIT License

915 stars 38 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Added support for saving and loading non ASCII chars in corpus and vocab

#86 IssacXid opened 3 days ago
6
How to disable tqdm

#84 fortyfourforty closed 6 days ago
0
Update README.md

#83 xhluca closed 1 week ago
0
Add BM25_pt (my library) to the acknowledgement

#80 jxmorris12 closed 3 weeks ago
3
Fails to save & load dictionaries & corpora with extended character set

#79 LevMuchnik opened 3 weeks ago
7
Fix regression with max call

#78 xhluca closed 3 weeks ago
0
Create nltk_stemmer.py

#77 aflip closed 3 weeks ago
2
Is it possible to incrementally add corpus to a retriever?

#74 ANYMS-A closed 4 weeks ago
0
Fix issue where adding "" makes word and stem dicts out of sync

#73 xhluca closed 3 weeks ago
1
Fix crash tokenizing with empty word_to_id

#72 mgraczyk closed 1 month ago
0
Pseudo relevance feedback implementation

#70 roynirmal closed 1 month ago
0
Get sparse embedding functionality

#68 lspataroG closed 1 month ago
0
Make empty strings an acceptable token

#67 xhluca closed 1 month ago
4
Add features to improve memory usage

#63 xhluca closed 1 month ago
0
How can I obtain the doc ID?

#62 Ask-sola closed 1 month ago
0
IndexError when customized corpus has only one element

#61 hoangnv735 closed 1 month ago
2
Index out of bounds errors in 0.2.0 and 0.2.1

#60 hantusk closed 1 month ago
9
Add saving and loading corpus/stopwords to `Tokenizer` and add integration to HF Hub via `bm25s.hf.TokenizerHF` (save/load)

#59 xhluca closed 2 months ago
0
Improve docs and names for bm25s.tokenization.Tokenizer

#56 xhluca closed 2 months ago
0
ImportError: cannot import name 'Tokenizer' from 'bm25s.tokenization' (/usr/local/lib/python3.10/dist-packages/bm25s/tokenization.py)

#55 hanxu49 closed 2 months ago
3
Add tests for BM25.retrieve in different scenario (tokenized, ids/vocab tuple, object with ids and vocab attributes, ids, strings)

#52 xhluca opened 2 months ago
0
Improve tokenizer

#51 xhluca closed 2 months ago
0
Add weight mask that are applied to scores during retrieval

#50 xhluca closed 2 months ago
2
Replace ujson with orjson, add load and close for `jsonlcorpus`

#49 xhluca closed 2 months ago
0
Refactor retrieval to make it faster to run in numba mode

#47 xhluca closed 2 months ago
4
Refactor tests to be ran in different jobs

#45 xhluca closed 3 months ago
0
Add type hint for `texts` argument in `tokenize` function, use `time.monotonic` instead of `time.time`

#44 dantetemplar closed 3 months ago
1
Use `time.monotonic` instead of `time.time`

#43 dantetemplar closed 3 months ago
1
Maybe use `time.monotonic` instead of `time.time`?

#42 dantetemplar closed 3 months ago
1
Add numba integration to allow for faster scoring and retrieval

#41 xhluca closed 3 months ago
0
[feature request] Implement BMX algorithm

#40 logan-markewich opened 3 months ago
6
Consider orjson as faster and more robust alternative to ujson

#39 xhluca closed 2 months ago
1
Thread safe search

#37 okhat closed 3 months ago
3
[Feature request] Document metadata and filtering

#35 dl423 closed 2 months ago
3
How to apply bm25s to languages such as Chinese?

#34 AlanLu0808 closed 3 months ago
2
Add stopwords for 10 new languages

#33 bm777 closed 3 months ago
14
Other language than english for the stopwords list

#32 bm777 closed 3 months ago
2
On-the-fly stemming

#31 xhluca closed 2 months ago
1
Bug fix and add link

#28 xhluca closed 4 months ago
0
🚨Before submitting an issue, read this 🚨

#27 xhluca closed 4 months ago
0
Update dev-0.1 branch

#24 xhluca closed 4 months ago
0
Update branch

#23 xhluca closed 4 months ago
0
可以增量更新索引吗？

#22 bojone closed 4 months ago
0
Can you query without a tokenization step?

#21 snewcomer closed 4 months ago
0
how to dynamic add/delete documents

#19 luoyangen closed 4 months ago
0
[Feature Request] Support attaching metadata to the corpus

#18 logan-markewich closed 4 months ago
3
Not Working for langchain Documents

#16 pradhandebasish2046 closed 4 months ago
0
Minor bug: `show_progress` not propagated in `BM25.index`

#15 ValeKnappich closed 5 months ago
1
Pre-computed TF-IDF

#9 celsofranssa closed 5 months ago
0
Capability Inquiry: Retrieving Specific JSON Records Based on Text

#8 RakshitKhajuria closed 5 months ago
4