snowplow / snowplow-elasticsearch-loader

Writes Snowplow enriched events from Kinesis to Elasticsearch
http://snowplowanalytics.com/
11 stars 18 forks source link

Add ability to filter fields #4

Open BenFradet opened 7 years ago

BenFradet commented 7 years ago

from snowplow/snowplow#3195:

At the moment the index is comprised of many many fields that are not used as a side effect of mapping the atomic definition to Elasticsearch. To reduce the size of the index it would be nice to be able to control what fields we care about storing in the index.

BenFradet commented 7 years ago

@jbeemster Did you mean having a filter that would only keep e.g. network_userid and domain_userid?

jbeemster commented 7 years ago

@BenFradet we were thinking about a bit of a blacklist where after you had processed the enriched event you could just drop the key from the final JSON that gets sent to Elasticsearch.

At the moment because we essentially translate it to a atomic.events definition there are a lot of fields that are always null. If this could be known ahead of time you could just remove these keys and never have them added to the mapping in the Elasticsearch index.

alexanderdean commented 7 years ago

De-scheduled - not clear how/when we would do this...