logstash-plugins / logstash-input-rss

RSS input for Logstash
Apache License 2.0
15 stars 23 forks source link

TooMuchTag Errors on children of the parent tag's children #37

Open kyle-rf opened 2 weeks ago

kyle-rf commented 2 weeks ago

Logstash information:

Please include the following information:

  1. Logstash version (e.g. bin/logstash --version) 8.13.4 (also tested with docker tag 8.15.0)
  2. Logstash installation source (e.g. built from source, with a package manager: DEB/RPM, expanded from tar or zip archive, docker) docker image
  3. How is Logstash being run (e.g. as a service/service manager: systemd, upstart, etc. Via command line, docker/kubernetes) docker entrypoint runs logstash from command line
  4. How was the Logstash Plugin installed Dockerfile RUN directive: RUN logstash-plugin install logstash-input-rss

JVM (e.g. java -version):

cd /jdk/bin/
./java --version
openjdk 17.0.11 2024-04-16
OpenJDK Runtime Environment Temurin-17.0.11+9 (build 17.0.11+9)
OpenJDK 64-Bit Server VM Temurin-17.0.11+9 (build 17.0.11+9, mixed mode, sharing)

on 8.15.0

openjdk 21.0.4 2024-07-16 LTS
OpenJDK Runtime Environment Temurin-21.0.4+7 (build 21.0.4+7-LTS)
OpenJDK 64-Bit Server VM Temurin-21.0.4+7 (build 21.0.4+7-LTS, mixed mode, sharing)

In both cases the plugin version: logstash-input-rss (3.0.6)

Description of the problem including expected versus actual behavior:

The plugin is generating TooMuchTag errors on a specific RSS feed.

Uknown error while parsing the feed {:url=>"https://cloudblog.withgoogle.com/topics/threat-intelligence/rss/", :exception=>#<RSS::TooMuchTagError: tag <title> is too much in tag <item>>}

I found the error message exists within the Ruby Gem https://github.com/ruby/rss/.

Inspecting the RSS feed directly, the <title> tag only exists as child of the <item> tag once per item. However, each item also has a child tag of <og> that contains it's own child <title> tag. If the object is as:

<item>
  <title>A title</title>
  <og>
    <title>A title</title>
  </og>
</item>

then:

Expected Behavior The second title tag is not a direct child of item, it's a child of og and should not cause this error.

Steps to reproduce:

Run logstash with an rss input:

rss {
    id => "googleintelligence"
    url => "https://cloudblog.withgoogle.com/topics/threat-intelligence/rss/"
    interval => 300
  }

Please include a minimal but complete recreation of the problem, including (e.g.) pipeline definition(s), settings, locale, etc. The easier you make for us to reproduce it, the more likely that somebody will take the time to look at it.

  1. logstash conf:
    input {
    rss {
    id => "googleintelligence"
    url => "https://cloudblog.withgoogle.com/topics/threat-intelligence/rss/"
    interval => 300
    }
    output {
    stdout {}
    }
  2. run logstash with the config from step 1

Provide logs (if relevant):

Uknown error while parsing the feed {:url=>"https://cloudblog.withgoogle.com/topics/threat-intelligence/rss/", :exception=>#<RSS::TooMuchTagError: tag <title> is too much in tag <item>>}
caineblood commented 2 weeks ago

The above comment asking you to download a file is malware to steal your account; do not under any circumstances download or run it. The post needs to be removed. If you have attempted to run it please have your system cleaned and your account secured immediately.

kyle-rf commented 1 week ago

To be clear, the comment @caineblood is speaking of was removed.