usc-isi-i2 / dig-etl-engine

Download DIG to run on your laptop or server.
http://usc-isi-i2.github.io/dig/
MIT License
101 stars 39 forks source link

Mapping file should by default NOT index #161

Closed szeke closed 6 years ago

szeke commented 6 years ago

We are getting many CDR docs with complex JSON inside them, and these are causing errors while indexing in ES. The issue is that ES is inferring a mapping for these unknown JSONs, and sometimes these are inconsistent.

One solution would be to by default no index anything, and have pinpoint specify what to index.

jasonslepicka commented 6 years ago

Pinpoint can disable dynamic indexing at the root of the CDR doc, but I don't know what other fields are being searched on outside of the indexed object that Pinpoint generates from the knowledge graph. It's my understanding that the DIG UI and my dig issue elasticsearch queries as well. Are they querying the same index? I don't want to break functionality of other services. Can we get a spec that defines what should be searchable? @szeke @ThomasSchellenbergNextCentury @GreatYYX

szeke commented 6 years ago

All the .key and .value fields in knowledge_graph should be indexed. The key cdr fields too, such as TLD, timestamp, etc., and the debugging fields. @GreatYYX please provide exact set of top level cdr fields to index

saggu commented 6 years ago

This was implemented, we need a new mapping file, will create a new issue