logstash-plugins / logstash-codec-protobuf

Codec plugin for parsing Protobuf messages
Apache License 2.0
26 stars 16 forks source link

protobuf file not working... maybe wrong class_name ? #7

Closed thechile closed 7 years ago

thechile commented 7 years ago

Hi

I have the protobuf file shown at the end of this issue... but i'm a bit confused what class_name should be ?

I have tried

codec => protobuf {
        class_name   => "PBDNSMessage"
        include_path => ['/etc/dnsdist/dnsmessage.pb.rb']
      }

Which works OK in as much as i get some events logged but it seems to only work for DNSQuestion.. not DNSResponse. I also receive the following error when running logstash with --debug which i believe is when logstash receives a DNSResponse event.. but not 100% sure.

01:07:40.554 [Ruby-0-Thread-31: /usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-4.1.0/lib/logstash/inputs/tcp.rb:150] ERROR logstash.inputs.tcp - An error occurred. Closing connection 
{:client=>"127.0.0.1:43502", :exception=>#<NoMethodError: undefined method `[]' for nil:NilClass>, 
:backtrace=>["/usr/share/logstash/vendor/bundle/jruby/1.9/gems/ruby-protocol-buffers-1.6.1/lib/protocol_buffers/runtime/field.rb:149:in `class'", 
"/usr/share/logstash/vendor/bundle/jruby/1.9/gems/ruby-protocol-buffers-1.6.1/lib/protocol_buffers/runtime/message.rb:237:in `initialize'", 
"/usr/share/logstash/vendor/bundle/jruby/1.9/gems/ruby-protocol-buffers-1.6.1/lib/protocol_buffers/runtime/message.rb:311:in `parse'", 
"/usr/share/logstash/vendor/bundle/jruby/1.9/gems/ruby-protocol-buffers-1.6.1/lib/protocol_buffers/runtime/field.rb:641:in `deserialize'", 
"/usr/share/logstash/vendor/bundle/jruby/1.9/gems/ruby-protocol-buffers-1.6.1/lib/protocol_buffers/runtime/decoder.rb:60:in `decode'", 
"/usr/share/logstash/vendor/bundle/jruby/1.9/gems/ruby-protocol-buffers-1.6.1/lib/protocol_buffers/runtime/message.rb:305:in `parse'", 
"/usr/share/logstash/vendor/bundle/jruby/1.9/gems/ruby-protocol-buffers-1.6.1/lib/protocol_buffers/runtime/message.rb:311:in `parse'", 
"/usr/share/logstash/vendor/bundle/jruby/1.9/gems/ruby-protocol-buffers-1.6.1/lib/protocol_buffers/runtime/field.rb:641:in `deserialize'", 
"/usr/share/logstash/vendor/bundle/jruby/1.9/gems/ruby-protocol-buffers-1.6.1/lib/protocol_buffers/runtime/decoder.rb:60:in `decode'", 
"/usr/share/logstash/vendor/bundle/jruby/1.9/gems/ruby-protocol-buffers-1.6.1/lib/protocol_buffers/runtime/message.rb:305:in `parse'", 
"/usr/share/logstash/vendor/bundle/jruby/1.9/gems/ruby-protocol-buffers-1.6.1/lib/protocol_buffers/runtime/message.rb:311:in `parse'", 
"/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-codec-protobuf-1.0.0/lib/logstash/codecs/protobuf.rb:81:in `decode'", 
"/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-4.1.0/lib/logstash/inputs/tcp.rb:182:in `handle_socket'", 
"/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-4.1.0/lib/logstash/inputs/tcp.rb:153:in `server_connection_thread'"]}

Here is the full protobuf file contents i'm using.

#!/usr/bin/env ruby
# Generated by the protocol buffer compiler. DO NOT EDIT!

require 'protocol_buffers'

# forward declarations
class PBDNSMessage < ::ProtocolBuffers::Message; end

class PBDNSMessage < ::ProtocolBuffers::Message
  # forward declarations
  class DNSQuestion < ::ProtocolBuffers::Message; end
  class DNSResponse < ::ProtocolBuffers::Message; end

  # enums
  module Type
    include ::ProtocolBuffers::Enum

    set_fully_qualified_name "PBDNSMessage.Type"

    DNSQueryType = 1
    DNSResponseType = 2
    DNSOutgoingQueryType = 3
    DNSIncomingResponseType = 4
  end

  module SocketFamily
    include ::ProtocolBuffers::Enum

    set_fully_qualified_name "PBDNSMessage.SocketFamily"

    INET = 1
    INET6 = 2
  end

  module SocketProtocol
    include ::ProtocolBuffers::Enum

    set_fully_qualified_name "PBDNSMessage.SocketProtocol"

    UDP = 1
    TCP = 2
  end

  set_fully_qualified_name "PBDNSMessage"

  # nested messages
  class DNSQuestion < ::ProtocolBuffers::Message
    set_fully_qualified_name "PBDNSMessage.DNSQuestion"

    optional :string, :qName, 1
    optional :uint32, :qType, 2
    optional :uint32, :qClass, 3
  end

  class DNSResponse < ::ProtocolBuffers::Message
    # forward declarations
    class DNSRR < ::ProtocolBuffers::Message; end

    set_fully_qualified_name "PBDNSMessage.DNSResponse"

    # nested messages
    class DNSRR < ::ProtocolBuffers::Message
      set_fully_qualified_name "PBDNSMessage.DNSResponse.DNSRR"

      optional :string, :name, 1
      optional :uint32, :type, 2
      optional :uint32, :class, 3
      optional :uint32, :ttl, 4
      optional :bytes, :rdata, 5
    end

    optional :uint32, :rcode, 1
    repeated ::PBDNSMessage::DNSResponse::DNSRR, :rrs, 2
    optional :string, :appliedPolicy, 3
    repeated :string, :tags, 4
    optional :uint32, :queryTimeSec, 5
    optional :uint32, :queryTimeUsec, 6
  end

  required ::PBDNSMessage::Type, :type, 1
  optional :bytes, :messageId, 2
  optional :bytes, :serverIdentity, 3
  optional ::PBDNSMessage::SocketFamily, :socketFamily, 4
  optional ::PBDNSMessage::SocketProtocol, :socketProtocol, 5
  optional :bytes, :from, 6
  optional :bytes, :to, 7
  optional :uint64, :inBytes, 8
  optional :uint32, :timeSec, 9
  optional :uint32, :timeUsec, 10
  optional :uint32, :id, 11
  optional ::PBDNSMessage::DNSQuestion, :question, 12
  optional ::PBDNSMessage::DNSResponse, :response, 13
  optional :bytes, :originalRequestorSubnet, 14
  optional :string, :requestorId, 15
  optional :bytes, :initialRequestId, 16
end

Can you help please confirm what the class_name should be and if that could be causing the enclosed error ?

Many thanks.

IngaFeick commented 7 years ago

Hi! The class name looks good. If you can decode the log events containing the dnd questions then you don't have a config issue. As to why the decoding of the responses fails: I cannot tell for sure without looking at the data that's going over the wire. Can you log the outgoing data on the producer side or maybe run a short tcpdump so that we can have a look at the binary data? If you can write the data in a human readable format somewhere on the producer side before it gets encoded that would also be helpful, especially for confirming your theory that something is wrong with the DNSResponse objects (which sounds plausible to me). Another thing you could do for starters is to make sure that you are really using the exact same version of the protobuf definition on the producer side. You probably do, but that would be a typical explanation for "weird" behaviour like this.

thechile commented 7 years ago

thanks

Dumping it on the producer side is not so easy without changing code and recompiling i'm afraid.

Both use version 2 as defined by syntax = "proto2"; in the .proto file if that is what you mean by checking versions ? ( the source protobuf file is here https://github.com/PowerDNS/pdns/blob/master/pdns/dnsmessage.proto )

The source of the protobuf is a dns server and I can repro the problem with pretty much every domain lookup.. only google.com seems to not create a error and prints the expected response as shown below. Although i think the rrs field should contain a lot more data about the responses resolved records and not be empty array.

{:type=>2,
 :messageId=>"\xF3(y&\xED\xFEO\x8C\x9B\xF9\xBE\x7F?R\xB6\xB5",
 :socketFamily=>1,
 :socketProtocol=>1,
 :from=>"\n\xE1\f\x13",
 :to=>"\n\x84\x10\x0F",
 :inBytes=>89,
 :timeSec=>1496303736,
 :timeUsec=>217292,
 :id=>53862,
 :question=>{:qName=>"google.com.", :qType=>6, :qClass=>1},
 :response=>
  {:rcode=>0,
   :rrs=>[],
   :tags=>[],
   :queryTimeSec=>1496303736,
   :queryTimeUsec=>182779}}

I tried finding a dissector for wireshark that would work with my tcpdumps but they didn't work for me and looking at the protobuf binary data doesn't really help me spot differences.

So still not sure if problem is with dns server or codec.

They provide a Python logger to pretty print the emitted protobuf events here https://github.com/PowerDNS/pdns/blob/master/contrib/ProtobufLogger.py. But it don't think that can be used to tshoot this issue ?

thechile commented 7 years ago

my bad... turns out there is a way to show the event before it's serialised.

Here is what a failing event looks like

type: DNSResponseType
messageId: "\230RKy^\031K\374\2070arQf\177&"
socketFamily: INET
socketProtocol: UDP
from: "\n\204\020\017"
to: "\n\204\020\017"
inBytes: 100
timeSec: 1496321913
timeUsec: 203469
id: 7447
question {
  qName: "bbc.com."
  qType: 1
  qClass: 1
}
response {
  rcode: 0
  rrs {
    name: "bbc.com."
    type: 1
    class: 1
    ttl: 300
    rdata: "\324:\366O"
  }
  rrs {
    name: "bbc.com."
    type: 1
    class: 1
    ttl: 300
    rdata: "\324:\364\027"
  }
  rrs {
    name: "bbc.com."
    type: 1
    class: 1
    ttl: 300
    rdata: "\324:\366N"
  }
  rrs {
    name: "bbc.com."
    type: 1
    class: 1
    ttl: 300
    rdata: "\324:\364\026"
  }
  queryTimeSec: 1496321913
  queryTimeUsec: 148037
}

And here is one that works

type: DNSResponseType
messageId: "\306\264\300j\372\237E\337\220\210@\300\364%r\313"
socketFamily: INET
socketProtocol: UDP
from: "\n\341\014\023"
to: "\n\204\020\017"
inBytes: 89
timeSec: 1496321916
timeUsec: 146785
id: 6528
question {
  qName: "google.com."
  qType: 6
  qClass: 1
}
response {
  rcode: 0
  queryTimeSec: 1496321916
  queryTimeUsec: 114695
}

When the rrs field is populated with data it seems to be causing the codec to error.

Any idea if this is related to the .rb or the logstash plugin ?

thanks.

thechile commented 7 years ago

I put together a simple ruby TCP server that accepted the protobuf message and decodes it, and i get the same behavior where query works but response fails with the same error

embedded/lib/ruby/gems/2.3.0/gems/ruby-protocol-buffers-1.6.1/lib/protocol_buffers/runtime/field.rb:141:in `class': undefined method `[]' for nil:NilClass (NoMethodError)

Not sure if this points to the ruby-protocol-buffers gem being the problem ?

.. or maybe the output from ruby-protoc is wrong.

thechile commented 7 years ago

hi.. turns out this is problem with protocol-buffers library. The field named 'class' blows up because it's interpreted as object type Class. Will close this issue since not codec fault. thanks.