tags are written multiple times with "true" and with actual string value if fields and tags have the same name

reinhard-brandstaedter commented 8 years ago

Logstash version 2.3, influxdb 0.13 on Ubuntu 64bit linux I'm trying to write dynamic tag keys with values using "use_event_fields_for_data_points"

When I output to a file from logstash I get this, which looks ok:

{"tags":["businesstransaction","application","y_page_controller_name"],"application":"B2C","businesstransaction":"Page Types","responsetime":34.17774963378906,"time":1467618289090,"y_page_controller_name":"MiniCart"}

However in influxdb I'm seeing tag key values of "true" only for my tags. On the wire I see these push requests which clearly show that the tags are written with "true" and additionally fields with the actual string value.

Page\ Types,businesstransaction=true,application=true,y_page_controller_name=true application="B2C",businesstransaction="Page Types",responsetime=0.8068558502197266E2,y_page_controller_name="CategoryPage" 1467617779642

In influxdb I'm seeing:

name: Page Types
----------------
key         value
y_page_controller_name  true

So it seems tags (in logstash) and fields with the same name conflict in the output to influxdb and cause confusion.

The logstash config:

influxdb {
                                host => "localhost"
                                db => "dynatrace"
                                codec => "json"
                                allow_time_override => true
                                measurement => "%{[businesstransaction]}"
                                #send_as_tags => [tags]
                                use_event_fields_for_data_points => true
                                #data_points => {
                                #       "application" => "%{[application]}"
                                #       "businesstransaction" => "%{[businesstransaction]}"
                                #       "splitting" => "%{[splitting]}"
                                #       "responsetime" => "%{[responsetime]}"
                                #       "%{[result_measure_0]}" => "%{[result_measure_0_val]}"
                                #       "%{[result_measure_1]}" => "%{[result_measure_1_val]}"
                                #       "%{[result_measure_2]}" => "%{[result_measure_2_val]}"
                                #       "time" => "%{[businessTransactions][occurrences][startTime]}"
                                #}
                                #send_as_tags => ["businesstransaction","application","splitting"]
                        }

reinhard-brandstaedter commented 8 years ago

Oh I just found this in the output code:

  # Extract tags from a hash of fields. 
  # Returns a tuple containing a hash of tags (as configured by send_as_tags) 
  # and a hash of fields that exclude the tags. If fields contains a key 
  # "tags" with an array, they will be moved to the tags hash (and each will be
  # given a value of true)
  # 
  # Example: 
  #   # Given send_as_tags: ["bar"]
  #   original_fields = {"foo" => 1, "bar" => 2, "tags" => ["tag"]}
  #   tags, fields = extract_tags(original_fields)
  #   # tags: {"bar" => 2, "tag" => "true"} and fields: {"foo" => 1}
  def extract_tags(fields)
    remainder = fields.dup

    tags = if remainder.has_key?("tags") && remainder["tags"].respond_to?(:inject)
      remainder.delete("tags").inject({}) { |tags, tag| tags[tag] = "true"; tags }
    else
      {}
    end

    @send_as_tags.each { |key| (tags[key] = remainder.delete(key)) if remainder.has_key?(key) }

    tags.delete_if { |key,value| value.nil? || value == "" }
    remainder.delete_if { |key,value| value.nil? || value == "" }

    [tags, remainder]
  end

Seems this behaviour is on purpose. I've changed it in my local installation to:

remainder.delete("tags").inject({}) { |tags, tag| tags[tag] = fields[tag]; tags }

which allows me now to use the tags in the original message.tags and populate them with the values of the message.field[tag] value. It would make sense to remove the field that is also listed in the tags from the message.fields and add a condition there. This would allow users to use dynamic created tags in logstash (tags that names are not known) and populate it with values in influxdb.

If I'd know some ruby I'd like to add something like this:

if fields["tagname"].exists() && tags["tagname"].exists() {
  tags["tagname"].value = fields["tagname"].value
  fields.delete["tagname"]
}

Would that be a useful change?

reinhard-brandstaedter commented 8 years ago

Voila... a bit of Ruby code:

  # Extract tags from a hash of fields. 
  # Returns a tuple containing a hash of tags (as configured by send_as_tags) 
  # and a hash of fields that exclude the tags. If fields contains a key 
  # "tags" with an array, they will be moved to the tags hash (and each will be
  # given a value of true)
  # 
  # Example: 
  #   # Given send_as_tags: ["bar"]
  #   original_fields = {"foo" => 1, "bar" => 2, "tags" => ["tag"]}
  #   tags, fields = extract_tags(original_fields)
  #   # tags: {"bar" => 2, "tag" => "true"} and fields: {"foo" => 1}
  def extract_tags(fields)
    remainder = fields.dup

    tags = if remainder.has_key?("tags") && remainder["tags"].respond_to?(:inject)
      remainder.delete("tags").inject({}) { |tags, tag| tags[tag] = if remainder.has_key?(tag) then fields[tag] else "true" end; tags }
    else
      {}
    end

    @send_as_tags.each { |key| (tags[key] = remainder.delete(key)) if remainder.has_key?(key) }

    tags.delete_if { |key,value| value.nil? || value == "" }
    remainder.delete_if { |key,value| value.nil? || value == "" }
    remainder.delete_if { |key,value| tags.has_key?(key) }

    [tags, remainder]
  end

offlinehacker commented 8 years ago

Tnx! you should make a pull request :)

reinhard-brandstaedter commented 8 years ago

no permission and new to github, seems I can't create a pull request as it tells me I have no permission to send a pull request...

sashaaKr commented 4 years ago

I'm wonder if it will be merged one day

logstash-plugins / logstash-output-influxdb

tags are written multiple times with "true" and with actual string value if fields and tags have the same name #42