theotherp / nzbhydra2

Usenet meta search
Other
1.26k stars 75 forks source link

/torznab endpoint produces invalid xml #458

Closed bdurrow closed 4 years ago

bdurrow commented 4 years ago

When I hit the following URL:

/torznab/api?t=tvsearch&cat=5000,5030,5040&extended=1&apikey=<<REDACTED>>&offset=0&limit=1

I get the following namespaces:

<rss xmlns:atom="http://www.w3.org/2005/Atom"  xmlns:newznab="http://www.newznab.com/DTD/2010/feeds/attributes/" version="2.0">

But the attributes nodes later in the rss look like this:

<torznab:attr name="category" value="100205"/>

I believe this is because the namespace is incorrect, when I hit jackett directly I get the following name spaces:

<rss version="1.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:torznab="http://torznab.com/schemas/2015/feed">

To be totally clear I get:

xmlns:newznab="http://www.newznab.com/DTD/2010/feeds/attributes/"

When I expect:

xmlns:torznab="http://torznab.com/schemas/2015/feed"

I found this bug when I was trying to add nzbhydra2 has a torznab indexer to sonarr. Sonarr logs the following:

Unable to connect to indexer: 'torznab' is an undeclared prefix. Line 18, position 14.

Exception
System.Xml.XmlException: 'torznab' is an undeclared prefix. Line 18, position 14.
  at System.Xml.XmlTextReaderImpl.Throw (System.Exception e) [0x00027] in <0800a875b7f24612a045bda8d032424a>:0 
  at System.Xml.XmlTextReaderImpl.Throw (System.String res, System.String arg, System.Int32 lineNo, System.Int32 linePos) [0x00016] in <0800a875b7f24612a045bda8d032424a>:0 
  at System.Xml.XmlTextReaderImpl.LookupNamespace (System.Xml.XmlTextReaderImpl+NodeData node) [0x0002f] in <0800a875b7f24612a045bda8d032424a>:0 
  at System.Xml.XmlTextReaderImpl.ElementNamespaceLookup () [0x00029] in <0800a875b7f24612a045bda8d032424a>:0 
  at System.Xml.XmlTextReaderImpl.ParseAttributes () [0x0047a] in <0800a875b7f24612a045bda8d032424a>:0 
  at System.Xml.XmlTextReaderImpl.ParseElement () [0x001df] in <0800a875b7f24612a045bda8d032424a>:0 
  at System.Xml.XmlTextReaderImpl.ParseElementContent () [0x0016f] in <0800a875b7f24612a045bda8d032424a>:0 
  at System.Xml.XmlTextReaderImpl.Read () [0x00085] in <0800a875b7f24612a045bda8d032424a>:0 
  at System.Xml.Linq.XContainer.ReadContentFrom (System.Xml.XmlReader r) [0x00201] in <ae71ebbe4a4f4f91a1eb3f2e6bba100e>:0 
  at System.Xml.Linq.XContainer.ReadContentFrom (System.Xml.XmlReader r, System.Xml.Linq.LoadOptions o) [0x00005] in <ae71ebbe4a4f4f91a1eb3f2e6bba100e>:0 
  at System.Xml.Linq.XDocument.Load (System.Xml.XmlReader reader, System.Xml.Linq.LoadOptions options) [0x00080] in <ae71ebbe4a4f4f91a1eb3f2e6bba100e>:0 
  at System.Xml.Linq.XDocument.Load (System.Xml.XmlReader reader) [0x00000] in <ae71ebbe4a4f4f91a1eb3f2e6bba100e>:0 
  at NzbDrone.Core.Indexers.RssParser.LoadXmlDocument (NzbDrone.Core.Indexers.IndexerResponse indexerResponse) [0x00090] in <808631c82f974aa9ade775cc9d46232a>:0 
  at NzbDrone.Core.Indexers.Torznab.TorznabRssParser.PreProcess (NzbDrone.Core.Indexers.IndexerResponse indexerResponse) [0x00000] in <808631c82f974aa9ade775cc9d46232a>:0 
  at NzbDrone.Core.Indexers.RssParser.ParseResponse (NzbDrone.Core.Indexers.IndexerResponse indexerResponse) [0x0000d] in <808631c82f974aa9ade775cc9d46232a>:0 
  at NzbDrone.Core.Indexers.HttpIndexerBase`1[TSettings].FetchPage (NzbDrone.Core.Indexers.IndexerRequest request, NzbDrone.Core.Indexers.IParseIndexerResponse parser) [0x00050] in <808631c82f974aa9ade775cc9d46232a>:0 
  at NzbDrone.Core.Indexers.HttpIndexerBase`1[TSettings].TestConnection () [0x00024] in <808631c82f974aa9ade775cc9d46232a>:0 

I run nzbhydra2 on FreeNAS in a FreeBSD jail. I installed from the FreeBSD pkg repository which installed nzbhydra2-2.6.2 then updated to 2.8.1 using the feature to do so in nzbhydra2. I would rather not post my debuginfos zip archive because I don't feel that it does enough to anonymize the results. Here is the relevant section of the nzbhydra.log:

2019-11-12 15:43:28.676  INFO --- [0.0-5076-exec-4] org.nzbhydra.config.ConfigWeb            : [Search: 348052, Host: <IP>] Received new config
2019-11-12 15:43:28.716  INFO --- [0.0-5076-exec-4] o.n.a.HydraAnonymousAuthenticationFilter : [Search: 348052, Host: <IP>] Granting basic user rights to anonymous users
2019-11-12 15:43:28.716  INFO --- [0.0-5076-exec-4] o.n.a.HydraAnonymousAuthenticationFilter : [Search: 348052, Host: <IP>] Granting stats rights to anonymous users
2019-11-12 15:43:28.716  INFO --- [0.0-5076-exec-4] o.n.a.HydraAnonymousAuthenticationFilter : [Search: 348052, Host: <IP>] Granting admin rights to anonymous users
2019-11-12 15:43:28.716  INFO --- [0.0-5076-exec-4] o.n.d.downloaders.DownloaderProvider     : [Search: 348052, Host: <IP>] Loading downloaders
<<REDACTED>>
2019-11-12 15:43:36.824  INFO --- [0.0-5076-exec-2] org.nzbhydra.api.ExternalApi             : [Search: 660216, Host: <IP>] Received external torznab API call: NewznabParameters{t=TVSEARCH, cat=[5000, 5030, 5040], offset=0, limit=1, raw=false, o=XML, attrs=[], extended=true}
2019-11-12 15:43:36.825  WARN --- [0.0-5076-exec-2] org.nzbhydra.searching.CategoryProvider  : [Search: 660216, Host: <IP>] Search supplied a general category and a subcategory: 5040,5030,5000. Will use the subcategory TV
2019-11-12 15:43:36.825  INFO --- [0.0-5076-exec-2] org.nzbhydra.api.ExternalApi             : [Search: 590861, Host: <IP>] Executing new search
<<REDACTED>>
2019-11-12 15:43:41.310  INFO --- [0.0-5076-exec-2] org.nzbhydra.searching.Searcher          : [Search: 590861, Host: <IP>] Returning results 1-1 from 195 results in cache. A total of 228 results is available from indexers of which 0 were already rejected and 33 were removed as duplicates
2019-11-12 15:43:41.311  INFO --- [0.0-5076-exec-2] org.nzbhydra.api.ExternalApi             : [Search: 590861, Host: <IP>] Search took 4485ms. Returning 1 results
2019-11-12 15:43:41.311  INFO --- [0.0-5076-exec-2] org.nzbhydra.api.ExternalApi             : [Search: 590861, Host: <IP>] Putting search result into cache
2019-11-12 15:44:42.496  INFO --- [0.0-5076-exec-1] o.n.debuginfos.DebugInfosProvider        : [Search: 395800, Host: <IP>] Creating debug infos
2019-11-12 15:44:42.497  INFO --- [0.0-5076-exec-1] o.n.debuginfos.DebugInfosProvider        : [Search: 395800, Host: <IP>] NZBHydra2 version: 2.8.1
2019-11-12 15:44:42.497  INFO --- [0.0-5076-exec-1] o.n.debuginfos.DebugInfosProvider        : [Search: 395800, Host: <IP>] Java command line: <BASEPATH>/lib/core-2.8.1-exec.jar --nobrowser --datafolder /config restarted
2019-11-12 15:44:42.497  INFO --- [0.0-5076-exec-1] o.n.debuginfos.DebugInfosProvider        : [Search: 395800, Host: <IP>] Java runtime name: OpenJDK Runtime Environment
2019-11-12 15:44:42.497  INFO --- [0.0-5076-exec-1] o.n.debuginfos.DebugInfosProvider        : [Search: 395800, Host: <IP>] Java runtime version: 1.8.0_222-b10
2019-11-12 15:44:42.498  INFO --- [0.0-5076-exec-1] o.n.debuginfos.DebugInfosProvider        : [Search: 395800, Host: <IP>] OS name: FreeBSD
2019-11-12 15:44:42.498  INFO --- [0.0-5076-exec-1] o.n.debuginfos.DebugInfosProvider        : [Search: 395800, Host: <IP>] OS architecture: amd64
2019-11-12 15:44:42.498  INFO --- [0.0-5076-exec-1] o.n.debuginfos.DebugInfosProvider        : [Search: 395800, Host: <IP>] User country: US
2019-11-12 15:44:42.498  INFO --- [0.0-5076-exec-1] o.n.debuginfos.DebugInfosProvider        : [Search: 395800, Host: <IP>] File encoding: US-ASCII
2019-11-12 15:44:42.499  INFO --- [0.0-5076-exec-1] o.n.debuginfos.DebugInfosProvider        : [Search: 395800, Host: <IP>] Number of rows in table SEARCH: 46
2019-11-12 15:44:42.500  INFO --- [0.0-5076-exec-1] o.n.debuginfos.DebugInfosProvider        : [Search: 395800, Host: <IP>] Number of rows in table SEARCHRESULT: 1749
2019-11-12 15:44:42.501  INFO --- [0.0-5076-exec-1] o.n.debuginfos.DebugInfosProvider        : [Search: 395800, Host: <IP>] Number of rows in table INDEXERSEARCH: 131
2019-11-12 15:44:42.502  INFO --- [0.0-5076-exec-1] o.n.debuginfos.DebugInfosProvider        : [Search: 395800, Host: <IP>] Number of rows in table INDEXERAPIACCESS: 131
2019-11-12 15:44:42.502  INFO --- [0.0-5076-exec-1] o.n.debuginfos.DebugInfosProvider        : [Search: 395800, Host: <IP>] Number of rows in table INDEXERAPIACCESS_SHORT: 187
2019-11-12 15:44:42.503  INFO --- [0.0-5076-exec-1] o.n.debuginfos.DebugInfosProvider        : [Search: 395800, Host: <IP>] Number of rows in table INDEXERNZBDOWNLOAD: 0
2019-11-12 15:44:42.504  INFO --- [0.0-5076-exec-1] o.n.debuginfos.DebugInfosProvider        : [Search: 395800, Host: <IP>] Size of database folder: 3MB
theotherp commented 4 years ago

You can send me the debug infos zip to theotherp@gmx.de

What would you expect to be removed / anonymized?

bdurrow commented 4 years ago

Ideally all IP addresses, hostnames and information indicating what I am automating. I realize that this will redact much of the useful information. I'll send you the debug infos zip - I would rather not make it available as part of the indelible permanent record.

bdurrow commented 4 years ago

I'm pretty sure this test is failing: If you look carefully at the <rss... node you will see that there is an extra space where the torznab namespace definition was indicating that it was mistakenly removed. I think that all of the various tests you do for searchType would be more robust as enums instead of string equality matches. As another example, text searches like this potentially introduce vulnerabilities (attacker can introduce the string "torznab" anywhere in the url to trigger those some parts of the logic while others execute as if it is a different kind of query - exposing more edge cases).

Thank you for this incredible product. I hope you don't feel like I am bashing on it and your hard work.

theotherp commented 4 years ago

That whole piece of code is incredibly hacky, as you can see. It took me ages to get it somehow working. I'm pretty sure there are better ways than replacing strings in the constructed XML :-D

No worries, I rarely get the debugging analysis along with the bug report.

theotherp commented 4 years ago

Works fine for me. The XML should have a searchType tag at the top, what does it say?

From 127.0.0.1:5076/torznab/api?t=tvsearch&cat=5000,5030,5040&extended=1&apikey=apikey&offset=0&limit=1 I get

<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:torznab="http://torznab.com/schemas/2015/feed" version="2.0">
    <searchType>torznab</searchType>
    <channel>
        <title>NZBHydra 2</title>
        <link>https://www.github.com/theotherp/nzbhydra2</link>
        <webMaster>theotherp@gmx.de</webMaster>
        <generator>NZBHydra2</generator>
    </channel>
</rss>
bdurrow commented 4 years ago

Mine does not:

➜  ~ curl -v 'http://<<REDACTED, not localhost>>:5076/torznab/api?t=tvsearch&cat=5030,5040&extended=1&apikey=<<REDACTED>>&offset=0&limit=1'
*   Trying 192.168.55.29...
* TCP_NODELAY set
* Connected to <<REDACTED, not localhost>> (<<REDACTED, not localhost>>) port 5076 (#0)
> GET /torznab/api?t=tvsearch&cat=5030,5040&extended=1&apikey=<<REDACTED>>&offset=0&limit=1 HTTP/1.1
> Host: <<REDACTED, not localhost>>:5076
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 200
< Set-Cookie: HYDRA-XSRF-TOKEN=<<REDACTED>>; Path=/
< X-Content-Type-Options: nosniff
< X-XSS-Protection: 1; mode=block
< Cache-Control: no-cache, no-store, max-age=0, must-revalidate
< Pragma: no-cache
< Expires: 0
< Content-Type: application/xml;charset=UTF-8
< Transfer-Encoding: chunked
< Date: Wed, 13 Nov 2019 22:20:29 GMT
<
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom"  xmlns:newznab="http://www.newznab.com/DTD/2010/feeds/attributes/" version="2.0">
    <channel>
        <title>NZBHydra 2</title>
        <link>https://www.github.com/theotherp/nzbhydra2</link>
        <webMaster>theotherp@gmx.de</webMaster>
        <generator>NZBHydra2</generator>
        <item>
            <title>Masterchef The Professionals S12E05 480p x264-mSD</title>
            <link>http://<<REDACTED, not localhost>>:5076/gettorrent/api/-2569130523405219662?apikey=<<REDACTED>></link>
            <enclosure url="http://<<REDACTED, not localhost>>:5076/gettorrent/api/-2569130523405219662?apikey=<<REDACTED>>" length="260350928" type="application/x-bittorrent"/>
            <pubDate>Wed, 13 Nov 2019 22:18:45 +0000</pubDate>
            <guid isPermaLink="false">-2569130523405219662</guid>
            <description></description>
            <comments>https://eztv.io/ep/1388698/masterchef-the-professionals-s12e05-480p-x264-msd/</comments>
            <category>TV</category>
            <size>260350928</size>
            <torznab:attr name="category" value="100001"/>
            <torznab:attr name="downloadvolumefactor" value="0"/>
            <torznab:attr name="guid" value="-2569130523405219662"/>
            <torznab:attr name="magneturl" value="magnet:?xt=urn:btih:6d2b36b556143f2f29ee3b401b5109d09ba4c30a&amp;dn=Masterchef.The.Professionals.S12E05.480p.x264-mSD[eztv].mkv[eztv]&amp;tr=udp:%2F%2Ftracker.coppersurfer.tk:80&amp;tr=udp:%2F%2Fglotorrents.pw:6969%2Fannounce&amp;tr=udp:%2F%2Ftracker.opentrackr.org:1337%2Fannounce&amp;tr=udp:%2F%2Fexodus.desync.com:6969"/>
            <torznab:attr name="minimumratio" value="1"/>
            <torznab:attr name="minimumseedtime" value="172800"/>
            <torznab:attr name="peers" value="0"/>
            <torznab:attr name="seeders" value="0"/>
            <torznab:attr name="size" value="260350928"/>
            <torznab:attr name="uploadvolumefactor" value="1"/>
            <torznab:attr name="hydraIndexerScore" value="70"/>
            <torznab:attr name="hydraIndexerHost" value="<<REDACTED, not localhost>>"/>
            <torznab:attr name="hydraIndexerName" value="EZTV"/>
        </item>
    </channel>
</rss>
* Connection #0 to host <<REDACTED, not localhost>> left intact
* Closing connection 0
theotherp commented 4 years ago

Hm, strange. Not saying it's your fault, but considering I haven't changed any of that code in ages and nobody else is reporting this issue, there must be something different with your setup. Do you have a reverse proxy inbetween?

I'll add debug logging to the related code and publish a new version later today. Perhaps that will provide some insight.

bdurrow commented 4 years ago

No proxy between them. Thanks for releasing a version to help debug this. I am working on getting the code to compile on my machine but I haven’t been able to yet. Downgrading to java 8 to see if that helps.

theotherp commented 4 years ago

2.8.3 has two lines of logging that will log what output type is used and why.

bdurrow commented 4 years ago

hmmm, Search type: null

2019-11-14 12:39:15.867 DEBUG --- [0.0-5076-exec-2] org.nzbhydra.api.ExternalApi             : [Host: <<REDACTED>>] Determined to be torznab call from URL "/torznab/api"
2019-11-14 12:39:15.867  INFO --- [0.0-5076-exec-2] org.nzbhydra.api.ExternalApi             : [Host: <<REDACTED>>] Received external torznab API call: NewznabParameters{t=TVSEARCH, cat=[5030, 5040], offset=0, limit=1, raw=false, o=XML, attrs=[], extended=true}
2019-11-14 12:39:15.868 DEBUG --- [0.0-5076-exec-2] org.nzbhydra.searching.CategoryProvider  : [Host:<<REDACTED>>] The supplied categories 5040,5030 match multiple configured categories
2019-11-14 12:39:15.868 DEBUG --- [0.0-5076-exec-2] org.nzbhydra.searching.CategoryProvider  : [Host: <<REDACTED>>] The supplied categories 5040,5030 match the configured main category TV and will be assigned to that
2019-11-14 12:39:15.869  INFO --- [0.0-5076-exec-2] org.nzbhydra.api.ExternalApi             : [Search: 600981, Host: <<REDACTED>>] Executing new search
2019-11-14 12:39:15.869 DEBUG --- [0.0-5076-exec-2] org.nzbhydra.api.ExternalApi             : [Search: 600981, Host: <<REDACTED>>] Determined to be torznab call from URL "/torznab/api"
...
2019-11-14 12:39:17.711  INFO --- [0.0-5076-exec-2] org.nzbhydra.searching.Searcher          : [Search: 600981, Host: <<REDACTED>>] Returning results 1-1 from 210 results in cache. A total of 238 results is available from indexers of which 0 were already rejected and 28 were removed as duplicates
2019-11-14 12:39:17.712  INFO --- [0.0-5076-exec-2] org.nzbhydra.api.ExternalApi             : [Search: 600981, Host: <<REDACTED>>] Search took 1844ms. Returning 1 results
2019-11-14 12:39:17.713  INFO --- [0.0-5076-exec-2] org.nzbhydra.api.ExternalApi             : [Search: 600981, Host: <<REDACTED>>] Putting search result into cache
2019-11-14 12:39:17.714 DEBUG --- [0.0-5076-exec-2] org.nzbhydra.web.WebConfiguration        : [Search: 600981, Host: <<REDACTED>>] Search type: null
theotherp commented 4 years ago

Uhm, are you using JSON? Are you caching your results? Looks like it: Putting search result into cache. Yeah, found the spot.

theotherp commented 4 years ago

That was an interesting one...

bdurrow commented 4 years ago

Thank you, that fixed it! I'm not sure it is related but I started getting logged authentication errors (I don't use authentication) after this fix went live on my system. Specifically when I try to "complete the config" for an indexer by doing a cap scan I get a toast saying "You are not allowed to visit that section." and logs look like this:

2019-11-14 23:43:02.627  WARN --- [0.0-5076-exec-5] o.n.auth.LoginAndAccessAttemptService    : [Search: 775897, Host: <<REDACTED, my workstation IP>>] 9 failed access attempts from IP/host <<REDACTED, my workstation IP>> in the last 24 hours. Will block access after 5 failed attempts

I will open a ticket if it turns out to be something other than my mistake.

theotherp commented 4 years ago

That shouldn't be related at all, I didn't change anything in that regard.