Open hsm207 opened 3 weeks ago
@hsm207 - Awesome to see you working on this. If you have any questions, don't hesitate to reach out.
After doing several other implementations, here's a bit of an implementation checklist for key things you'll come across during the implementation:
Dockerfile / Docker compose configuration
Install Spark connector (inside the aips-notebooks
Dockerfle)
Collection management: creation/deletion/healthcheck
Collection schemas:
Primative field types: text, string, keyword, boolean, integer, double
location coordinate field
dense vector field: dimensions (512, 768), vector encoding/quantization (1bit, 32 bits), and dot_product similarity
tokenizers/filters: comma delimited, lower case, whitespace/punctuation, NGram, delimited payload
Query functionality: sorting, filtering, limit, query fields, return fields multi-field search AND/OR/NOT operators minimum phrase matching query time boosting index time boosting vector search reranking by query highlighting debug/explain spell check/autocomplete
There are some other things like hybrid search (reciprocal rank fusion) that are implemented at the Collection
level already generically, but that you can override in the WeaviateCollection
to push down into the engine, since Weaviate
has native support for that built in.
As mentioned in the /engines/README.md
, the LTR implementation is required, but can be done outside the engine. Happy to chat with you on this if you need a generic implementation. The SparseLexicalSemanticSearch
implementation is likewise required, but it's just crafting some very specific Weaviate query syntax for a handful of specific query patterns (popularity boosting, geo radius filtering, etc.) I wouldn't worry about the EntityExtractor
or the SemanticKnowledgeGraph
, as most engines don't have this built in and you just treat this as an external library call.
At any rate, hope that's helpful. Let us know if you have any questions we can assist with!
Signed-off-by: hsm207 hsm207@users.noreply.github.com