jina-ai / serve

☁️ Build multimodal AI applications with cloud-native stack
https://jina.ai/serve
Apache License 2.0
21.13k stars 2.22k forks source link

New fields into Document #2855

Closed vwoloszyn closed 3 years ago

vwoloszyn commented 3 years ago

Question,

How can I extend Document to handle different fields, e.g., title, or description?

e.g.,

d = Document() d.text = "blabla" d.title = "blabla" d.description = "blabla" c.post('/index', (d))

bwanglzu commented 3 years ago

@vwoloszyn this is the expected behaviour:

d = Document()
d.text = 'this is the content of the document'
d.tags['description'] = 'description of the document'
d.tags['author'] = 'this is a name'
d.tags['publish_date'] = '2021.07.01'
d.tags['...'] = ...

by default if you're working on text modality, use d.text, for the metadata, and other information, use d.tags[key] = value

vwoloszyn commented 3 years ago

Thank you very much @bwanglzu . Is there any way to search by tag? e.g, 'hemingway' in d.tags['author'] ?

bwanglzu commented 3 years ago

hi @vwoloszyn if I understand your intent correctly:

  1. you want to do semantic search using the content of the document
  2. you want to perform hard match/filter based on certain tags.

This is definitely doable, Jina is flexible on it, let me write a simple executor for you:

import operator
from typing import List, Dict
from jina import Executor, DocumentArray, requests

class TagsFilter(Executor):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.ops = {
            'gt': operator.gt,
            'lt': operator.lt,
            'eq': operator.eq,
            'ge': operator.ge,
            'le': operator.le,
            'ne': operator.ne,
        }

    @requests(on='/search'). # happens at search time, at the end of the flow
    def filter(self, docs: DocumentArray, parameters: Dict, **kwargs):
        conditions = parameters.get('conditions')  # you pass conditions as parameters at search time
        if not conditions:  # if user did not pass condition, executor do nothing
            return docs
        for cond in conditions:
            ops_instance = self.ops.get(cond['operator']) 
            for doc in docs:
                filtered_matches = []
                for match in doc.matches:  # loop through all the matches of the query document
                    if ops_instance(match.tags.get(cond['attribute']), cond['value']):  # in this case, `attribute` is author, operator is eq, value is hemingway
                        filtered_matches.append(match)
                doc.matches = filtered_matches. # you replace the matches of the current search document with filtered matches

parameters = {'conditions': [{'attribute': 'author', 'operator': 'eq', 'value': 'hemingway'}]}  # you add your condition to the conditions inside parameters

c.post('/search', inputs=..., parameters=parameters)

Hopefully it helps.

vwoloszyn commented 3 years ago

Many thanks!!!