weaviate / contextionary

Weaviate's own language vectorizer, which allows for semantic context-based searches in Weaviate
https://weaviate.io/developers/weaviate/modules/retriever-vectorizer-modules/text2vec-contextionary
BSD 3-Clause "New" or "Revised" License
14 stars 2 forks source link

Synonym Feature Discovery #13

Closed etiennedi closed 5 years ago

etiennedi commented 5 years ago

Discovery for best way to store and retrieve

Questions:

etiennedi commented 5 years ago
  • one key per synonym vs. large json map?

One per key is fine, but one request per lookup is far too slow (1000 lookups around 1s). We need to watch (also works for prefixes) changes and keep an in-memory lookup copy handy at all times.

  • how to store vector position efficiently?

simple json is fine. We need both the position as well as the original concept (and weight), so we can recalculate in the future

  • at what point should the check for synonym happen when vectorizing a corpus?

immediately before checking. This could possibly be aided by a wrapper of the existing contextionary type

Closing as the discovery is complete.