uken / fluent-plugin-elasticsearch

Apache License 2.0
891 stars 310 forks source link

fluent-plugin-elasticsearch, geoip, typecast Together and slow。 #18

Closed zbyufei closed 10 years ago

zbyufei commented 10 years ago

hi uken: I find the data is so slow show in elasticsearch-head when I uses fluent-plugin-elasticsearch, geoip, typecast. my config file is:

<source>
  type tail
  path /opt/realtimesearch/elasticsearch-0.90.5/s3log/access/clickstream-access.log
  pos_file /opt/realtimesearch/elasticsearch-0.90.5/s3log/access/clickstream-access.log.pos
  tag geoip.access
  format /^(?<remote_intranet>([0-9a-z\.,\s\%20]*,(\%20|\s)*)*)(?<remote_internet>[0-9a-z\. ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*) "(?<referer>[^\"]*)" "(?<agent>[^\"]*)" (?<uid_set>[^ ]*) (?<uid_got>[^ ]*) (?<customer_id_cookie>[^ ]*) (?<ga_cookie>[^ ]*) (?<request_time>[^ ]*) (?<cookie_TestType>[^ ]*) (?<ABTest>[^ ]*)$/
  time_format %d/%b/%Y:%H:%M:%S %z
</source>

<match geoip.access>
    type geoip
    geoip_lookup_key         remote_internet
    enable_key_city          geoip_city
    enable_key_latitude      geoip_lat
    enable_key_longitude     geoip_lon
    enable_key_country_code  geoip_country
    enable_key_region        geoip_region
    remove_tag_prefix    geoip.access
    add_tag_prefix       typecast.access
    flush_interval       1s
</match>

<match typecast.access>
  type typecast
  item_types code:integer,size:integer,request_time:float
  prefix es
</match>

<match es.typecast.access>
  type elasticsearch
  host 127.0.0.1
  port 9200
  logstash_prefix logstash
  logstash_format true
  type_name fluentd
  index_name fluentd
  flush_interval 5s
</match>

if I want to use Buffer Plugin, How can I do, you can give me an example in my config file. I don't known buffer plugin is before or after <match es.typecast.access> Thanks!

zbyufei commented 10 years ago

new config with buffer plugin like this:

<match es.typecast.access>
  type elasticsearch
  host 127.0.0.1
  port 9200
  logstash_prefix logstash
  logstash_format true
  type_name fluentd
  index_name fluentd
  flush_interval 5s

  buffer_type memory
  flush_interval 60
  retry_limit 17
  retry_wait 1.0
  num_threads 2
</match>

I am a beginners for fluent and elasticsearch. I want to cat logfile(>10G) to clickstream-access.log but so slowly.

pitr commented 10 years ago

Not sure what geoip and typecast plugins do, but we have 'flush_interval 30s' in our configs for elasticsearch plugin. Try decreasing that value. Also, try playing with buffer_chunk_limit and friends (see http://docs.fluentd.org/articles/buffer-plugin-overview)

I guess you've already seen #3?

zbyufei commented 10 years ago

yes. I have read #3 . Thanks, I tried to remove geoip and typecast to see how the performance.

zbyufei commented 10 years ago

I've done the test and found to be elasticsearch index performance problems. I was reading through elasticsearch issue and elasticsearch google groups to improve its performance. Here's a relevant performance test reports:

http://blog.sematext.com/2013/07/08/elasticsearch-refresh-interval-vs-indexing-performance/ :

refresh_interval: 1s   – 2.0K docs/s
refresh_interval: 5s   – 2.5K docs/s
refresh_interval: 30s – 3.4K docs/s

Thanks and I close this issue.