issues
search
mattilyra
/
LSH
Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents
MIT License
278
stars
78
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Head query deduplication & efficient extraction
#26
arindombora
opened
1 week ago
0
Added functions for deduplication & count-based k sampling for extracting duplicate queries.
#25
arindombora21
closed
1 week ago
0
Python 3.10 compatiblity
#24
LukeLIN-web
opened
6 months ago
1
shingles error in Introduction.ipynb
#23
bugggggggg
closed
1 year ago
0
error: command 'cc' failed with exit status 1
#22
FaHeidari
opened
3 years ago
1
ModuleNotFoundError: No module named 'LSH.lsh.cMinhash'
#21
zhangxiaohuM
opened
3 years ago
1
A few questions about the `len` argument in the function `MurmurHash3_x86_32`.
#20
peinan
opened
4 years ago
0
How to make minhash scalable
#19
SankBad
opened
4 years ago
0
Jaccard should be performed on sets, but appears to be given numpy arrays
#18
thesamuel
opened
4 years ago
3
Unable to install in Python3
#17
prabalbansal
opened
4 years ago
6
Unable to install on Windows
#16
S0mbre
opened
5 years ago
1
ModuleNotFoundError: No module named 'lsh.cMinhash'
#15
nithinivi
opened
5 years ago
5
Unable to install on Mojave (10.14.2) or Ubuntu
#14
EggsBenedict
closed
5 years ago
5
storing cache
#13
desainishit
closed
4 years ago
1
Unable to install
#12
nishitd
closed
5 years ago
1
Is it possible to extend this project for document file formats like PDF, DOCX
#11
abhinavbom
closed
6 years ago
1
Check buckets of LSH MinHash
#10
hsiaoma
closed
6 years ago
0
Is it possible to extend LSH to detect near duplicate images?
#9
vnnw
closed
5 years ago
1
Create PyPi package
#8
mattilyra
opened
6 years ago
2
Add support for SimHash
#7
mattilyra
opened
6 years ago
0
Fix functools ImportError on python 2.7
#6
stultus
closed
6 years ago
0
LSH is installable through pip
#5
hbrylkowski
closed
7 years ago
1
allow other backends for storing duplicate documents
#4
mattilyra
opened
7 years ago
2
parallel deduplication
#3
mattilyra
opened
7 years ago
3
Tests and ease of use
#2
mbatchkarov
closed
7 years ago
3
Need help installing this
#1
clayfink
closed
7 years ago
2