jina-ai / serve

☁️ Build multimodal AI applications with cloud-native stack
https://jina.ai/serve
Apache License 2.0
21.13k stars 2.22k forks source link

clarify function signatures in indexers #1439

Closed cristianmtr closed 3 years ago

cristianmtr commented 3 years ago

Describe the feature

Some indexers have confusing function signatures. Specifically the usage of keys, ids is wrong. This can create confusion for newcomers, or when reasoning about the code.

Example

Your proposal

One big PR to change the signature across all the indexers.

JoanFM commented 3 years ago

I would even go one step further and remove the signatures from BaseIndexer and have them declared at KVIndexer and VectorIndexer since they are really different and behave and interact really differently.

I see very little in common between KV and VectorIndexers for them to share a common BaseClass.

For now they could share only the abstracion of query and write handlers but without sharing common interfaces for add query update or delete

hanxiao commented 3 years ago

have them declared at KVIndexer and VectorIndexer since they are really different and behave and interact really differently.

How are they behave differently? They are both indexers, they both use special but similar wire format for storage. They both rely on memmap for reading. They share the same abstraction.

I would vote for changing arg name keys to vectors, but make a pause and think through before pushing these two classes away. Having repeated separated code is always easier than having a common abstraction.

JoanFM commented 3 years ago

have them declared at KVIndexer and VectorIndexer since they are really different and behave and interact really differently.

How are they behave differently? They are both indexers, they both use special but similar wire format for storage. They both rely on memmap for reading. They share the same abstraction.

One is queried by keys and the other by vectors. They share the fact that are persisted in memory and some common patterns that belong in BaseIndexer. But in the interface they offer they seem different to me

hanxiao commented 3 years ago

But in the interface they offer they seem different to me

that's why they are from different BaseClass, notice the inheritance relation at the moment: image

JoanFM commented 3 years ago

But in the interface they offer they seem different to me

that's why they are from different BaseClass, notice the inheritance relation at the moment: image

Yes, my point is a very minor thing to not declare add or query at BaseIndexer but at BaseKVIndexer and BaseVectorIndexer.

I would also separate the crafter and executor class. But anyhow none of this is important at all

jina-bot commented 3 years ago

This issue is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 4 days

NouiliKh commented 3 years ago

Hey @cristianmtr, I am trying to work on this issue but I don't know if I missing something since this would be my first contribution. Under indexers, I found only two function signatures that need to be changed: https://github.com/jina-ai/jina/blob/35139e96ff828bc2dfdee92f8ac88a0c0b09e726/jina/executors/indexers/vector.py#L302 https://github.com/jina-ai/jina/blob/35139e96ff828bc2dfdee92f8ac88a0c0b09e726/jina/executors/indexers/vector.py#L116 Are these the only two that need to be changed (+ changing affected classes of course)? Meanwhile, I found only one "query_by_id" as a function name. Is this the one that needs to be changed? https://github.com/jina-ai/jina/blob/35139e96ff828bc2dfdee92f8ac88a0c0b09e726/jina/executors/indexers/vector.py#L190

jina-bot commented 3 years ago

This issue is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 4 days

maximilianwerk commented 3 years ago

Dear @NouiliKh first of all, I am really sorry for the late response. Your post somehow got missed at the beginning of the year.

Regarding your question: This change also must be done consistently in the base classes ('jina/executor/indexers/init.py::BaseKVIndexer' and 'jina/executor/indexers/init.py::BaseVectorIndexer') and all the inheriting classes in the jina-hub/indexers. Furthermore, all related tests need to be refactored.

Yes, the query_by_id is the correct function, that needs to be refactored.

If you want to proceed with this refactoring, joining our slack might be beneficial to go into detail on technical questions. https://jina-ai.slack.com/

NouiliKh commented 3 years ago

@maximilianwerk Thanks for the reply. This clarifies the task a little more for me. I am already on slack and I will be reaching out as soon as I face an issue addressing this task. I will try as well to start working on it as soon as possible. Thanks again.

jina-bot commented 3 years ago

This issue is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 4 days

nan-wang commented 3 years ago

@cristianmtr I think this is solved by #1829.

cristianmtr commented 3 years ago

Closed by https://github.com/jina-ai/jina/pull/1829/