Closed cristianmtr closed 3 years ago
I would even go one step further and remove the signatures from BaseIndexer and have them declared at KVIndexer and VectorIndexer since they are really different and behave and interact really differently.
I see very little in common between KV and VectorIndexers for them to share a common BaseClass.
For now they could share only the abstracion of query and write handlers but without sharing common interfaces for add query update or delete
have them declared at KVIndexer and VectorIndexer since they are really different and behave and interact really differently.
How are they behave differently? They are both indexers, they both use special but similar wire format for storage. They both rely on memmap for reading. They share the same abstraction.
I would vote for changing arg name keys
to vectors
, but make a pause and think through before pushing these two classes away. Having repeated separated code is always easier than having a common abstraction.
have them declared at KVIndexer and VectorIndexer since they are really different and behave and interact really differently.
How are they behave differently? They are both indexers, they both use special but similar wire format for storage. They both rely on memmap for reading. They share the same abstraction.
One is queried by keys and the other by vectors. They share the fact that are persisted in memory and some common patterns that belong in BaseIndexer. But in the interface they offer they seem different to me
But in the interface they offer they seem different to me
that's why they are from different BaseClass, notice the inheritance relation at the moment:
But in the interface they offer they seem different to me
that's why they are from different BaseClass, notice the inheritance relation at the moment:
Yes, my point is a very minor thing to not declare add or query at BaseIndexer but at BaseKVIndexer and BaseVectorIndexer.
I would also separate the crafter and executor class. But anyhow none of this is important at all
This issue is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 4 days
Hey @cristianmtr, I am trying to work on this issue but I don't know if I missing something since this would be my first contribution. Under indexers, I found only two function signatures that need to be changed: https://github.com/jina-ai/jina/blob/35139e96ff828bc2dfdee92f8ac88a0c0b09e726/jina/executors/indexers/vector.py#L302 https://github.com/jina-ai/jina/blob/35139e96ff828bc2dfdee92f8ac88a0c0b09e726/jina/executors/indexers/vector.py#L116 Are these the only two that need to be changed (+ changing affected classes of course)? Meanwhile, I found only one "query_by_id" as a function name. Is this the one that needs to be changed? https://github.com/jina-ai/jina/blob/35139e96ff828bc2dfdee92f8ac88a0c0b09e726/jina/executors/indexers/vector.py#L190
This issue is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 4 days
Dear @NouiliKh first of all, I am really sorry for the late response. Your post somehow got missed at the beginning of the year.
Regarding your question: This change also must be done consistently in the base classes ('jina/executor/indexers/init.py::BaseKVIndexer' and 'jina/executor/indexers/init.py::BaseVectorIndexer') and all the inheriting classes in the jina-hub/indexers. Furthermore, all related tests need to be refactored.
Yes, the query_by_id
is the correct function, that needs to be refactored.
If you want to proceed with this refactoring, joining our slack might be beneficial to go into detail on technical questions. https://jina-ai.slack.com/
@maximilianwerk Thanks for the reply. This clarifies the task a little more for me. I am already on slack and I will be reaching out as soon as I face an issue addressing this task. I will try as well to start working on it as soon as possible. Thanks again.
This issue is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 4 days
@cristianmtr I think this is solved by #1829.
Describe the feature
Some indexers have confusing function signatures. Specifically the usage of
keys
,ids
is wrong. This can create confusion for newcomers, or when reasoning about the code.Example
keys
to refer to what is actuallyquery_vectors
: https://github.com/jina-ai/jina/blob/2585798ac0ed33dd406a8f8a0a457d72bb2b7af0/jina/executors/indexers/vector.py#L302 (This would affect all classes derived from it and from which it subclasses)query_by_id
should bequery_by_key
, in order to maintain internal consistency with usage ofkeys
.Your proposal
One big PR to change the signature across all the indexers.