Closed vwoloszyn closed 3 years ago
@vwoloszyn this is the expected behaviour:
d = Document()
d.text = 'this is the content of the document'
d.tags['description'] = 'description of the document'
d.tags['author'] = 'this is a name'
d.tags['publish_date'] = '2021.07.01'
d.tags['...'] = ...
by default if you're working on text modality, use d.text
, for the metadata, and other information, use d.tags[key] = value
Thank you very much @bwanglzu . Is there any way to search by tag? e.g, 'hemingway' in d.tags['author'] ?
hi @vwoloszyn if I understand your intent correctly:
This is definitely doable, Jina is flexible on it, let me write a simple executor for you:
import operator
from typing import List, Dict
from jina import Executor, DocumentArray, requests
class TagsFilter(Executor):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.ops = {
'gt': operator.gt,
'lt': operator.lt,
'eq': operator.eq,
'ge': operator.ge,
'le': operator.le,
'ne': operator.ne,
}
@requests(on='/search'). # happens at search time, at the end of the flow
def filter(self, docs: DocumentArray, parameters: Dict, **kwargs):
conditions = parameters.get('conditions') # you pass conditions as parameters at search time
if not conditions: # if user did not pass condition, executor do nothing
return docs
for cond in conditions:
ops_instance = self.ops.get(cond['operator'])
for doc in docs:
filtered_matches = []
for match in doc.matches: # loop through all the matches of the query document
if ops_instance(match.tags.get(cond['attribute']), cond['value']): # in this case, `attribute` is author, operator is eq, value is hemingway
filtered_matches.append(match)
doc.matches = filtered_matches. # you replace the matches of the current search document with filtered matches
parameters = {'conditions': [{'attribute': 'author', 'operator': 'eq', 'value': 'hemingway'}]} # you add your condition to the conditions inside parameters
c.post('/search', inputs=..., parameters=parameters)
Hopefully it helps.
Many thanks!!!
Question,
How can I extend Document to handle different fields, e.g., title, or description?
e.g.,
d = Document() d.text = "blabla" d.title = "blabla" d.description = "blabla" c.post('/index', (d))