hazelcast / hz-docs

Source content for the Hazelcast Platform documentation
10 stars 96 forks source link

Docs: Feedback for Vector Collection #1235

Open naadhira opened 2 months ago

naadhira commented 2 months ago

Hi, I have some feedback about this page

Using a vector collection within a pipeline is not included in the documentation. We need to add the following examples:

1) Using a pipeline to create/update a VectorCollection. This is in the vector search tutorial, but we also need an example of it on this page as well.

2) Using a pipeline for the similarity search. In this example, the client can be any HZ client. The search string is ingested into the pipeline, which does the embedding, the subsequent similarity search, any LLM interactions, and returns the results to the client. This opens up vector search to any HZ client - all the ML/AI work is done within the cluster.

Happy to do the editing/review once the code is in place...

k-jamroz commented 2 months ago

Jet bindings for vector collection are documented under Jet:

See the original PR: https://github.com/hazelcast/hz-docs/pull/1125

There are no links from data structure description, but we do not have such links to Jet docs also for IMap or other data structure.

k-jamroz commented 2 months ago

Using a pipeline for the similarity search.

This feels more like a tutorial. Basic search invocation from pipeline is shown in https://docs.hazelcast.com/hazelcast/6.0-snapshot/integrate/vector-collection-connector#searching-in-vector-collection

In this example, the client can be any HZ client. The search string is ingested into the pipeline,

I do not know of any easy way to send input from HZ client directly to a Jet pipeline. Observables work the other way around: client can get data produced by the pipeline. But as input you need to use Kafka, IMap journal etc. There is no ITopic source. I my examples I used sockets, but they are a bit problematic and require permissions in the cloud.

At least that is a situation if you think about streaming pipeline. For batch pipeline this can be organized differently, but you would have to submit the job many times (eg. once for each query) which is doable but IMO inconvenient and inefficient.

which does the embedding, the subsequent similarity search, any LLM interactions, and returns the results to the client. This opens up vector search to any HZ client - all the ML/AI work is done within the cluster.

We had examples of embedding creation in Jet pipelines as part of demos. I did not check yet if they ended in published tutorials.

naadhira commented 2 months ago

Then we should add cross-references, and the restructuring I suggested in Slack. There are use cases for both methods of ingestion and that should be discussed under the data structure itself. The way its set up now, you don't even know using Jet is an option... Except for in the tutorial.

On Fri, Aug 2, 2024, 7:11 AM Krzysztof Jamróz @.***> wrote:

Jet bindings for vector collection are documented under Jet:

- https://docs.hazelcast.com/hazelcast/6.0-snapshot/integrate/vector-collection-connector

https://docs.hazelcast.com/hazelcast/6.0-snapshot/integrate/legacy-file-connector#fvecs-and-ivecs

https://docs.hazelcast.com/hazelcast/6.0-snapshot/integrate/file-connector#fvecs-and-ivecs

See the original PR: #1125 https://github.com/hazelcast/hz-docs/pull/1125

There are no links from data structure description, but we do not have such links to Jet docs also for IMap or other data structure.

— Reply to this email directly, view it on GitHub https://github.com/hazelcast/hz-docs/issues/1235#issuecomment-2265486441, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJNVGSAXKWYF6BSYRUTW6TTZPOHRRAVCNFSM6AAAAABL3RFSXWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRVGQ4DMNBUGE . You are receiving this because you authored the thread.Message ID: @.***>

-- This message contains confidential information and is intended only for the individuals named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required, please request a hard-copy version. -Hazelcast

k-jamroz commented 2 months ago

I agree that currently discovering that you can use vector collections in Jet is not easy