uhh-lt / dats

Discourse Analysis Tool Suite
Apache License 2.0
15 stars 2 forks source link

image similarity search returns same image multiple times #443

Closed bigabig closed 1 week ago

bigabig commented 2 weeks ago

image

bigabig commented 2 weeks ago

I created a new project and just crawled this website: https://www.tagesschau.de/inland/innenpolitik/spd-miersch-generalsekretaer-100.html

bigabig commented 1 week ago

calling the endpoint /search/simsearch/images with

{
  "proj_id": 1,
  "query": "",
  "top_k": 100,
  "threshold": 0,
  "filter": {
    "id": "string",
    "items": [],
    "logic_operator": "or"
  }
}

returns

[
  {
    "sdoc_id": 2,
    "score": 0.6143090426921844
  },
  {
    "sdoc_id": 3,
    "score": 0.6074398159980774
  },
  {
    "sdoc_id": 3,
    "score": 0.601840615272522
  },
  {
    "sdoc_id": 4,
    "score": 0.6002316176891327
  },
  {
    "sdoc_id": 4,
    "score": 0.5986002087593079
  },
  {
    "sdoc_id": 5,
    "score": 0.5930981040000916
  }
]

the same image is incorrectly returned multiple times