logstash-plugins / logstash-input-google_cloud_storage

Apache License 2.0
4 stars 9 forks source link

Fix/4 #7

Closed josephlewis42 closed 2 years ago

josephlewis42 commented 4 years ago

fixes #4 by adding a new field file_prefix that does server-side filtering.

This can be tested with the pre-release: https://storage.googleapis.com/logstash-prereleases/logstash-input-google_cloud_storage-0.12.0-java.gem

tmegow commented 4 years ago

I rebuilt using the custom version of the GCS plugin (via the gem file). I did not change the pipeline config values, and I'm seeing these errors during execution:

Error: no method 'list' for arguments (org.jruby.RubyString,org.jruby.java.proxies.ArrayJavaProxy) on Java::ComGoogleCloudStorage::StorageImpl
  available overloads:
    (com.google.cloud.storage.Storage.BucketListOption[])
    (java.lang.String,com.google.cloud.storage.Storage.BlobListOption[])
  Exception: NameError
  Stack: /usr/share/logstash/vendor/local_gems/7a400dae/gcs/lib/logstash/inputs/cloud_storage/client.rb:33:in `list_blobs'
/usr/share/logstash/vendor/local_gems/7a400dae/gcs/lib/logstash/inputs/google_cloud_storage.rb:86:in `list_processable_blobs'
/usr/share/logstash/vendor/local_gems/7a400dae/gcs/lib/logstash/inputs/google_cloud_storage.rb:69:in `list_download_process'
/usr/share/logstash/vendor/local_gems/7a400dae/gcs/lib/logstash/inputs/google_cloud_storage.rb:62:in `block in run'
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/stud-0.0.23/lib/stud/interval.rb:20:in `interval'
/usr/share/logstash/vendor/local_gems/7a400dae/gcs/lib/logstash/inputs/google_cloud_storage.rb:61:in `run'
/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:309:in `inputworker'
/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:302:in `block in start_input'
[DEBUG] 2019-10-10 18:23:08.304 [[transcripts]<google_cloud_storage] googlecloudstorage - Closing {:plugin=>"LogStash::Inputs::GoogleCloudStorage"}
[DEBUG] 2019-10-10 18:23:08.305 [[transcripts]<google_cloud_storage] pluginmetadata - Removing metadata for plugin 8cedb02bc64a2e82c736b877d216f90bf7e7c12c7c9c676f06b7ac97919018b6
[INFO ] 2019-10-10 18:23:08.305 [[transcripts]<google_cloud_storage] googlecloudstorage - Fetching blobs from shsp-sales-dialer-dev
[ERROR] 2019-10-10 18:23:08.307 [[transcripts]<google_cloud_storage] javapipeline - A plugin had an unrecoverable error. Will restart this plugin.
  Pipeline_id:transcripts
  Plugin: <LogStash::Inputs::GoogleCloudStorage bucket_id=>"shsp-sales-dialer-dev", json_key_file=>"/sales-dialer/creds/gcp_service_account.json", codec=><LogStash::Codecs::JSON id=>"json_da07edc0-42c8-45d9-9107-8b647abf4f5e", enable_metric=>true, charset=>"UTF-8">, metadata_key=>"x-goog-meta-logstash-transcripts", interval=>60, id=>"8cedb02bc64a2e82c736b877d216f90bf7e7c12c7c9c676f06b7ac97919018b6", file_matches=>"transcriptions/.*json", enable_metric=>true, file_exclude=>"^$", delete=>false, unpack_gzip=>true, temp_directory=>"/tmp/ls-in-gcs">
josephlewis42 commented 4 years ago

@tmegow, just to make sure did you pick up the second change I made as part of the commit? https://github.com/logstash-plugins/logstash-input-google_cloud_storage/pull/7/files#diff-c74e95df46ddcc4954cdb6235bbd0793R24-R32

tmegow commented 4 years ago
[ERROR] 2019-10-10 20:42:04.393 [Converge PipelineAction::Create<blocks>] registry - Problems loading a plugin with {:type=>"input", :name=>"google_cloud_storage", :path=>"logstash/inputs/google_cloud_storage", :error_message=>"\n\n\tyou might need to reinstall the gem which depends on the missing jar or in case there is Jars.lock then resolve the jars with `lock_jars` command\n\nno such file to load -- com/google/cloud/google-cloud-storage/1.62.0/google-cloud-storage-1.62.0 (LoadError)", :error_class=>RuntimeError, :error_backtrace=>["uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/jar_dependencies.rb:356:in `do_require'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/jar_dependencies.rb:265:in `block in require_jar'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/jar_dependencies.rb:307:in `require_jar_with_block'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/jar_dependencies.rb:264:in `require_jar'", "/usr/share/logstash/lib/bootstrap/patches/jar_dependencies.rb:6:in `require_jar'", "/usr/share/logstash/vendor/local_gems/2a2dab14/gcs/lib/logstash-input-google_cloud_storage_jars.rb:4:in `<main>'", "org/jruby/RubyKernel.java:987:in `require'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/polyglot-0.3.5/lib/polyglot.rb:65:in `require'", "/usr/share/logstash/vendor/local_gems/2a2dab14/gcs/lib/logstash/inputs/cloud_storage/client.rb:1:in `<main>'", "org/jruby/RubyKernel.java:987:in `require'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/polyglot-0.3.5/lib/polyglot.rb:65:in `require'", "/usr/share/logstash/vendor/local_gems/2a2dab14/gcs/lib/logstash/inputs/cloud_storage/client.rb:5:in `<main>'", "org/jruby/RubyKernel.java:987:in `require'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/polyglot-0.3.5/lib/polyglot.rb:65:in `require'", "/usr/share/logstash/logstash-core/lib/logstash/plugins/registry.rb:191:in `legacy_lookup'", "/usr/share/logstash/logstash-core/lib/logstash/plugins/registry.rb:166:in `block in lookup'", "org/jruby/ext/thread/Mutex.java:165:in `synchronize'", "/usr/share/logstash/logstash-core/lib/logstash/plugins/registry.rb:162:in `lookup'", "/usr/share/logstash/logstash-core/lib/logstash/plugins/registry.rb:216:in `lookup_pipeline_plugin'", "/usr/share/logstash/logstash-core/lib/logstash/plugin.rb:143:in `lookup'", "org/logstash/plugins/PluginFactoryExt.java:203:in `plugin'", "org/logstash/plugins/PluginFactoryExt.java:120:in `buildInput'", "org/logstash/execution/JavaBasePipelineExt.java:50:in `initialize'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:24:in `initialize'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline_action/create.rb:36:in `execute'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:325:in `block in converge_state'"]}
[DEBUG] 2019-10-10 20:42:04.397 [Converge PipelineAction::Create<blocks>] registry - Problems loading the plugin with {:type=>"input", :name=>"google_cloud_storage"}
[ERROR] 2019-10-10 20:42:04.489 [Converge PipelineAction::Create<blocks>] agent - Failed to execute action {:action=>LogStash::PipelineAction::Create/pipeline_id:blocks, :exception=>"LogStash::PluginLoadingError", :message=>"Couldn't find any input plugin named 'google_cloud_storage'. Are you sure this is correct? Trying to load the google_cloud_storage input plugin resulted in this error: Problems loading the requested plugin named google_cloud_storage of type input. Error: RuntimeError \n\n\tyou might need to reinstall the gem which depends on the missing jar or in case there is Jars.lock then resolve the jars with `lock_jars` command\n\nno such file to load -- com/google/cloud/google-cloud-storage/1.62.0/google-cloud-storage-1.62.0 (LoadError)", :backtrace=>["/usr/share/logstash/logstash-core/lib/logstash/plugins/registry.rb:221:in `lookup_pipeline_plugin'", "/usr/share/logstash/logstash-core/lib/logstash/plugin.rb:143:in `lookup'", "org/logstash/plugins/PluginFactoryExt.java:203:in `plugin'", "org/logstash/plugins/PluginFactoryExt.java:120:in `buildInput'", "org/logstash/execution/JavaBasePipelineExt.java:50:in `initialize'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:24:in `initialize'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline_action/create.rb:36:in `execute'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:325:in `block in converge_state'"]}

Output when building the gem file:

$ gem build logstash-input-google_cloud_storage.gemspec
WARNING:  open-ended dependency on logstash-codec-plain (>= 0) is not recommended
  if logstash-codec-plain is semantically versioned, use:
    add_runtime_dependency 'logstash-codec-plain', '~> 0'
WARNING:  open-ended dependency on stud (>= 0.0.22) is not recommended
  if stud is semantically versioned, use:
    add_runtime_dependency 'stud', '~> 0.0', '>= 0.0.22'
WARNING:  open-ended dependency on mimemagic (>= 0.3.3) is not recommended
  if mimemagic is semantically versioned, use:
    add_runtime_dependency 'mimemagic', '~> 0.3', '>= 0.3.3'
WARNING:  open-ended dependency on logstash-devutils (>= 0.0.16, development) is not recommended
  if logstash-devutils is semantically versioned, use:
    add_development_dependency 'logstash-devutils', '~> 0.0', '>= 0.0.16'
WARNING:  See http://guides.rubygems.org/specification-reference/ for help
  Successfully built RubyGem
  Name: logstash-input-google_cloud_storage
  Version: 0.12.0
  File: logstash-input-google_cloud_storage-0.12.0-java.gem

Output when installing plugin via the gem file:

Installing logstash-input-google_cloud_storage
Installation successful

@josephlewis42 I rebuilt the plugin gem ensuring to include both commits in this PR. Now I'm getting this error. Do I have a disconnect between google_cloud_storage and logstash-input-google_cloud_storage?

tmegow commented 4 years ago

@josephlewis42 Were you able to see this working in your testing? I am excited for the possible reduction to our ingress traffic. Are the built commands I used from my test attempt needing ammending?

generatives commented 3 years ago

I have also run into trouble trying to use this branch, the same error as @tmegow. Is there chance this PR will be fixed and merged?

generatives commented 3 years ago

@jsvd @kares Tagging you because you seem to be working more actively on Logstash plugins. Is there any possibility of this branch being merged? The change is very valuable for large GCS buckets, in my case the plugin is basically unusable without server side filtering.

aksakalmustafa commented 2 years ago

@josephlewis42 This is a great PR. I've implemented similar functionality in my local and using it. Is there any plan to merge this branch? Otherwise, I'll create a similar PR. Thanks!

josephlewis42 commented 2 years ago

@aksakalmustafa go ahead! I'm no longer actively working on this repository and I bet the PR is stale, I'll close this one so yours can take center stage.