Closed jeesim2 closed 8 years ago
Thanks for the suggestion, indeed, I haven't had such a use case, but it looks interesting. If I understand correctly, a single file's records should end up in different indices, based on a certain property of the record.
I'm not sure yet, whether this is a genuine fast indexing use case or more a preprocessing thing (split input file on correct boundaries, then index). Let me think about it.
@miku Thanks for the response. As you know logstash-file-input was not invented for read Large complete file. https://github.com/logstash-plugins/logstash-input-file/issues/78 So In many case esbulk can be a alternatives, include for me.
whether this is a genuine fast indexing use case
I agree that bulk processing's first goal is fast indexing.
split input file on correct boundaries, then index
Also I have considered do some preprocessing to split to individual date's file. but as I have to do that every day, it is a little burdened.
Cheers, Jihun
Just a quick update: I implemented a first version of dynamic date support - here's a short screencast.
For a given file like this:
$ cat fixtures/dynamic-1.ldj
{"time":"2016-05-01", "name": "a"}
{"time":"2016-05-02", "name": "b"}
{"time":"2016-05-03", "name": "c"}
One can use the golang-style date spec to set a date field and a date field layout:
$ esbulk -verbose -index test-{2006-01-02} -date-field time \
-date-field-layout 2006-01-02 fixtures/dynamic-1.ldj
The result would be three indices: test-2016-05-01, test-2016-05-02, test-2016-05-03 with one document each.
Another example:
$ cat fixtures/dynamic-2.ldj
{"time":"2016-05-30T10:00:00.000+0900", "name": "a"}
{"time":"2016-05-30T00:00:00.000+0900", "name": "b"}
$ esbulk -verbose -index test-{2006-01-02} -date-field time \
-date-field-layout 2006-01-02T15:04:05Z0700 fixtures/dynamic-2.ldj
The result would be two indices: test-2016-05-29 and 2016-05-30, due to conversion to UTC.
Just a few points, that make this feature kind of difficult, at least with the current overall implementation:
2016-05-30T10:00:00.000+0900
will be parse as a date with a timezone. As I understand, it would be better for kibana to convert these dates to UTC. Maybe there is the need for another option, like -convert-to-utc
or something like that.Here's another screencast, showing UTC conversion.
The code for all this is in https://github.com/miku/esbulk/tree/issue-1, feel free to check it out and test it. I am still a bit hesitant to include this, but if you think it would be useful, I will certainly consider it.
I'm afraid I cannot implement this at the moment. It would add yet another two flags and I cannot think of an easy way to support this for now.
@miku thank you for the feedback!
For the sake of completeness: There is a processor type, that can route documents based on date:
The purpose of this processor is to point documents to the right time based index based on a date or timestamp field in a document by using the date math index name support.
Hello.
Thanks for develop such a cool utility. I have moved from logstash to this esbulk. Btw, I have a small leak of function with this utility.
We usually have log files which contains date field. and we create index with logstash index pattern. (e.g logstash-2016.05.30) But In some(or many) case dates of single file can be spreaded over several days, particularly local date based rolling strategy forced.
For example event_20160530.json may have these lines
However, elasticsearch and kibana forces UTC convert. So log 1 have to logstash-2016.05.29 and log 2 have to logstash-2016.05.30.
I know it is not a simple problem. But could you please consider feature something like this?