logstash-plugins / logstash-codec-protobuf

Codec plugin for parsing Protobuf messages
Apache License 2.0
26 stars 16 forks source link

Nested message does not work with multiple schema files or same file #40

Closed rahulkumar-pawar closed 5 years ago

rahulkumar-pawar commented 5 years ago

We are currently using protobuf, and written code something like Common log schema

syntax = "proto3";
package structurelogs;
message CommonLog{
    string eventType = 1;
    string eventTime = 2;
}

Device log schema

syntax = "proto3";
package structurelogs;
message DeviceLog{
    string deviceType = 2;
    string serialNumber = 3;
    string action = 4;
    string sourceIP = 5;
    string destinationIP = 6;
}

version specific schema

syntax = "proto3";
package structurelogs;
import "CommonLog.proto";
import "DeviceLog.proto";
message VersionSpecificDeviceLog{
    CommonLog commonLog = 1;
    DeviceLog deviceLog= 2;
    //several other properties
}

in logstash using

output {
    file {
      codec => protobuf
      {
                    protobuf_version => 3
         class_name => "Structurelogs::VersionSpecificDeviceLog"
                        include_path => ['/var/protobuf/CommonLog.pb.rb', '/var/protobuf/DeviceLog.pb.rb', '/var/protobuf/VersionSpecificDeviceLog.pb.rb']
      }
      path => "/var/parsedlogs/parsed.log"
    }
}

After starting Logstash service, I always get error like error=>#<NoMethodError: undefined method 'eventTime=' sometimes with different variable which is part of either CommonLog or DeviceLog

I have used ruby-protoc to compile .proto files on ubuntu.

rahulkumar-pawar commented 5 years ago

Trying with multiple messages in the same file and getting error like error=>#<TypeError: can't assign Hash to ProtocolBuffers::Field::MessageField>

IngaFeick commented 5 years ago

Hi, can you please give me the exact version of your protobuf compiler, and your command for compilation? Also the repo/github project that you downloaded the compiler from? Thx

rahulkumar-pawar commented 5 years ago

@IngaFeick

Tried with ruby-protoc and now [official google compiler (https://github.com/protocolbuffers/protobuf/releases) . version 3.7.1 But I missed adding protobuf_version in logstash configuration. So added the version 3 in conf. (Updated code in question). the command to compile .proto is
protoc --ruby_out=/usr/share/logstash/config VersionSpecificDeviceLog.proto

As I pull every property to the same file. no nested or reference from another file/message.

Adding this got an error Failed to execute action {:action=>LogStash::PipelineAction::Create/pipeline_id:main, :exception=>"NoMethodError", :message=>"undefined method add_file' for #<Google::Protobuf::Builder:0x3995e975>"

As per your comment on other issues that use the 3.5.0 version of protoc. so tried But now getting an error

Failed to execute action {:action=>LogStash::PipelineAction::Create/pipeline_id:main, :exception=>"NoMethodError", :message=>"undefined method msgclass' for nil:NilClass",

As per your comment on other issues that use the exactly qualified name. so tried with class_name => "structurelogs::VersionSpecificDeviceLog" and class_name => "Structurelogs::VersionSpecificDeviceLog" but error was same.

Thought that might be an issue with versioning so tried Logstash 6.7.1 with gems jRuby 2.3.0 and Logstash 7 with 2.5.0 and codec-protobuf-1.1.0. but no luck

Thanks In Advance

rahulkumar-pawar commented 5 years ago

Diggin more in to the code, It is breaking at line @pb_builder = Google::Protobuf::DescriptorPool.generated_pool.lookup(class_name).msgclass

tried to print generated_pooland metainfo_messageclassesand got the following output

[2019-04-19T07:33:27,024][WARN ][logstash.codecs.protobuf ] Message Classes {"structurelogs.VersionSpecificDeviceLog"=>{}}

[2019-04-19T07:33:27,119][WARN ][logstash.codecs.protobuf ] generated_pool --- !ruby/object:Google::Protobuf::DescriptorPool {}

seems metaclasses are having my class entry but generated_pool does not have anything. Am I missing something or why it is not initialized?

rahulkumar-pawar commented 5 years ago

Well, I was able to resolve the error but removing package name from .proto and provide the only name to the class_name property in logstash conf. Compiled .proto with 3.5.0 version. Working for me is

kafka {
    bootstrap_servers => "kafka_IP:Port"
    topic_id => "OUTPUT_TOPIC_NAME"
    key_serializer_class => "org.apache.kafka.common.serialization.ByteArraySerializer"
    value_serializer_class => "org.apache.kafka.common.serialization.ByteArraySerializer"
    codec => protobuf
        {                   
            class_name => "VersionSpecificDeviceLog"
            include_path => ['/path/to/VersionSpecificDeviceLog_pb.rb']
            protobuf_version => 3
        }
}

This works with the nested message.

Another point is though I'm able to resolve the issue performance is 1/4 of JSON codec. Looking into it now.

Keeping it open for closing comment from @IngaFeick.

rahulkumar-pawar commented 5 years ago

I could see the regex used to convert '@field_name' to 'field_name' taking time to execute. Ran the same load on single core and it is processing the 1K extra messages if we remove regex.

code

datahash = datahash.inject({}){|x,(k,v)| x[k.gsub(/@/,'').to_sym] = (should_convert_to_string?(v) ? v.to_s : v); x}

closing the defect as I got the proto 3 working with nested messages and performance factor.