princeton-nlp / SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
MIT License
3.36k stars 507 forks source link

Bug? build_index(sentences, faiss_fast = True), the result of search function is repeated #134

Closed techperfect closed 2 years ago

techperfect commented 2 years ago

from simcse import SimCSE model = SimCSE("princeton-nlp/sup-simcse-bert-base-uncased")

sentences = ['chelsea boots men', 'chelsea boots women', 'black chelsea boots', 'chunky chelsea boots', 'doc chelsea boots', 'blue chelsea boots', 'green chelsea boots', 'waterproof chelsea boots men', 'chunky sole chelsea boots', 'chunky leather chelsea boots', 'brogue chelsea boots', 'wide chelsea boots', 'mens black suede chelsea boots', 'chelsea work boots mens', 'aquatherm chelsea boots', 'designer chelsea boots', 'square toe chelsea boots', 'taupe chelsea boots', 'laguna chelsea boot', 'comfortable chelsea boots', 'block heel chelsea boots', 'chelsea lug boots', 'james chelsea boots', 'most comfortable chelsea boots', 'dealer boots womens', 'black platform chelsea boots', 'high profile chelsea boots', 'grey suede chelsea boots', 'tan suede chelsea boots', 'cream chelsea boots womens', 'pointed chelsea boots', 'nude chelsea boots']

model.build_index(sentences, faiss_fast = True) results = model.search('Black Classic Chelsea Boots', threshold = 0.3, top_k = 20) print(results)

[('designer chelsea boots', 0.30279988), ('chunky leather chelsea boots', 0.32448673), ('mens black suede chelsea boots', 0.3338831), ('taupe chelsea boots', 0.35558155), ('comfortable chelsea boots', 0.3685085), ('chunky chelsea boots', 0.37122363), ('chunky sole chelsea boots', 0.379904), ('square toe chelsea boots', 0.39477926), ('nude chelsea boots', 3.4028235e+38), ('nude chelsea boots', 3.4028235e+38), ('nude chelsea boots', 3.4028235e+38), ('nude chelsea boots', 3.4028235e+38), ('nude chelsea boots', 3.4028235e+38), ('nude chelsea boots', 3.4028235e+38), ('nude chelsea boots', 3.4028235e+38), ('nude chelsea boots', 3.4028235e+38), ('nude chelsea boots', 3.4028235e+38), ('nude chelsea boots', 3.4028235e+38)]

gaotianyu1350 commented 2 years ago

Hi,

Maybe check the environment? This is the result that I get and it is normal:

[('black chelsea boots', 0.9320579171180725), ('black platform chelsea boots', 0.8767697811126709), ('designer chelsea boots', 0.8486000895500183), ('chunky leather chelsea boots', 0.8377567529678345), ('mens black suede chelsea boots', 0.8330584764480591), ('taupe chelsea boots', 0.8222088813781738), ('comfortable chelsea boots', 0.8157457113265991), ('chunky chelsea boots', 0.8143882751464844), ('chunky sole chelsea boots', 0.8100481033325195), ('square toe chelsea boots', 0.8026103973388672), ('brogue chelsea boots', 0.7950376868247986), ('most comfortable chelsea boots', 0.7939415574073792), ('block heel chelsea boots', 0.7921258211135864), ('wide chelsea boots', 0.7695946097373962), ('chelsea boots women', 0.7507659792900085), ('blue chelsea boots', 0.7472003102302551), ('cream chelsea boots womens', 0.7404086589813232), ('chelsea lug boots', 0.7345914840698242), ('tan suede chelsea boots', 0.7332489490509033), ('high profile chelsea boots', 0.7316401600837708)]