phaistos-networks / Trinity

Trinity IR Infrastructure
Apache License 2.0
235 stars 20 forks source link

How to implement like lucene where we can search fieldA = ABC AND fieldB = BCD #13

Open yashhema opened 6 years ago

yashhema commented 6 years ago

Hello, Say my data consists of a text and it also has some attributes (like Info1 , Info2 etc) I should be able to do search both on any word present or any of the attributes. Can you suggest how it can be implemented ?

markpapadakis commented 6 years ago

Hello,

You can just index those attributes like other terms, by, for example, prefixing the term with an appropriate prefix. For example, info1:dog, info2:black and then you can search for ( info1:dog OR info1:cat ) animals Or fielda:ABC AND fieldb:BCD

Does this answer your question?

yashhema commented 6 years ago

I am interested in implement document like interface (Like there is in lucene). so we can have fields of different types like int, long . So we can save dates and do range queries on dates and numbers. I saw the comments in Index.h regarding a place holder for having multiple fields. Can you provide some pointers on how should i go about implementing that.

markpapadakis commented 6 years ago

This is currently not supported in Trinity, although it shouldn't be hard to extend it to support it. Lucene initially supported range queries (e.g numeric ranges) by partitioning the values space and then using a simple space-partition to token scheme, though later they switched to a more generic, and more efficient, scheme where they encode a per-segment representation for a values-space and query that instead. You should be able to do that as well, or whatever else really. It's just that we didn't need that feature, and for cases where we need something similar, we have an in-memory per-document store for quick-lookups for filtering and whatnot.