Lucene Indexer for Delimited Files
(Lucene 7.0.1 docs)
- index delimited files without predefining schema
- schema is inferred by splitting header
- full text search across multiple folders, files, and columns
- each column in source data is a Field in Lucene
- each line in source data is a Document in Lucene
- each header in source data is a Index
- each tenant's Lucene index is persisted in a separate FSDirectory
High level logic diagram
Search results include
- name of matched file
- name (date stamp) of parent folder
- matched column name
- matched line number
- names and values of other columns on matched line
Overview of Per Tenant Indexing
Each data is organized physically (ie. in /esldata/
) hierarchially as follows
- Tenant has 1 or more:
- Dated folder has 1 or more:
- Delimited data file has 1: