threatminer 403 forbidden causes JSON parsing error

mavensecurity commented 4 years ago

Module Name Which module is affected? https://github.com/lanmaster53/recon-ng-marketplace/blob/master/modules/recon/domains-hosts/threatminer.py

Bug Description [A clear and concise description of the bug.] Response is 403 Forbidden; response is not JSON so parser throws error.

Steps to Reproduce Steps to reproduce the behavior:

Go to threatminer module, set SOURCE to vwrm.com and run
See error below

[*] ========================= REQUEST =========================
url:    https://api.threatminer.org/v2/domain.php?rt=5&q=vwrm.com
method: GET /v2/domain.php?rt=5&q=vwrm.com
header: User-Agent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0"
header: Accept-Encoding: gzip, deflate
header: Accept: */*
header: Connection: keep-alive
[*] ========================= RESPONSE =========================
status: 403 Forbidden
header: Date: Fri, 08 Nov 2019 02:45:30 GMT
header: Content-Type: text/html; charset=UTF-8
header: Transfer-Encoding: chunked
header: Connection: keep-alive
header: Set-Cookie: __cfduid=d0953c1908dd65f36517060f8bfb3471a1573181130; expires=Sat, 07-Nov-20 02:45:30 GMT; path=/; domain=.threatminer.org; HttpOnly
header: Cache-Control: max-age=10
header: Expires: Fri, 08 Nov 2019 02:45:40 GMT
header: X-Frame-Options: SAMEORIGIN
header: Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
header: Vary: Accept-Encoding
header: Server: cloudflare
header: CF-RAY: 532436144f3ea5a0-NRT
header: Content-Encoding: gzip
body:   b'<!DOCTYPE html>\n<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->\n<!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->\n<!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->\n<!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->\n<head>\n<title>Access denied | api.threatminer.org used Cloudflare to restrict access</title>\n<meta charset="UTF-8" />\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />\n<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />\n<meta name="robots" content="noindex, nofollow" />\n<meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1" />\n<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/cf.errors.css" type="text/css" media="screen,projection" />\n<!--[if lt IE 9]><link rel="stylesheet" id=\'cf_styles-ie-css\' href="/cdn-cgi/styles/cf.errors.ie.css" type="text/css" media="screen,projection" /><![endif]-->\n<style type="text/css">body{margin:0;padding:0}</style>\n\n\n<!--[if gte IE 10]><!--><script type="text/javascript" src="/cdn-cgi/scripts/zepto.min.js"></script><!--<![endif]-->\n<!--[if gte IE 10]><!--><script type="text/javascript" src="/cdn-cgi/scripts/cf.common.js"></script><!--<![endif]-->\n\n\n\n</head>\n<body>\n  <div id="cf-wrapper">\n    <div class="cf-alert cf-alert-error cf-cookie-error" id="cookie-alert" data-translate="enable_cookies">Please enable cookies.</div>\n    <div id="cf-error-details" class="cf-error-details-wrapper">\n      <div class="cf-wrapper cf-header cf-error-overview">\n        <h1>\n          <span class="cf-error-type" data-translate="error">Error</span>\n          <span class="cf-error-code">1010</span>\n          <small class="heading-ray-id">Ray ID: 532436144f3ea5a0 &bull; 2019-11-08 02:45:30 UTC</small>\n        </h1>\n        <h2 class="cf-subheadline">Access denied</h2>\n      </div><!-- /.header -->\n\n      <section></section><!-- spacer -->\n\n      <div class="cf-section cf-wrapper">\n        <div class="cf-columns two">\n          <div class="cf-column">\n            <h2 data-translate="what_happened">What happened?</h2>\n            <p>The owner of this website (api.threatminer.org) has banned your access based on your browser\'s signature (532436144f3ea5a0-ua60).</p>\n          </div>\n\n          \n        </div>\n      </div><!-- /.section -->\n\n      <div class="cf-error-footer cf-wrapper">\n  <p>\n    <span class="cf-footer-item">Cloudflare Ray ID: <strong>532436144f3ea5a0</strong></span>\n    <span class="cf-footer-separator">&bull;</span>\n    <span class="cf-footer-item"><span>Your IP</span>: 2400:8902::f03c:91ff:feae:227c</span>\n    <span class="cf-footer-separator">&bull;</span>\n    <span class="cf-footer-item"><span>Performance &amp; security by</span> <a href="https://www.cloudflare.com/5xx-error-landing?utm_source=error_footer" id="brand_link" target="_blank">Cloudflare</a></span>\n    \n  </p>\n</div><!-- /.error-footer -->\n\n\n    </div><!-- /#cf-error-details -->\n  </div><!-- /#cf-wrapper -->\n\n  <script type="text/javascript">\n  window._cf_translation = {};\n  \n  \n</script>\n\n</body>\n</html>\n'
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/david/Tools/recon-ng/recon/core/module.py", line 299, in do_run
    self.run()
  File "/home/david/Tools/recon-ng/recon/core/module.py", line 293, in run
    self.module_run(*params)
  File "/home/david/.recon-ng/modules/recon/domains-hosts/threatminer.py", line 17, in module_run
    if resp.json().get('status_code') == '200':
  File "/home/david/Envs/recon-ng/lib/python3.6/site-packages/requests/models.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
------------------------------------------------------------
[!] Something broken? See https://github.com/lanmaster53/recon-ng/wiki/Troubleshooting#issue-reporting.

Expected Behavior [A clear and concise description of the expected behavior.]

Use curl to see that response should be 200 OK and small JSON:

$ curl --include 'https://api.threatminer.org/v2/domain.php?rt=5&q=vwrm.com'
HTTP/2 200
date: Fri, 08 Nov 2019 02:53:48 GMT
content-type: application/json; charset=utf-8
content-length: 98
set-cookie: __cfduid=d371e6a01512450a84c45d066ba0b54f71573181628; expires=Sat, 07-Nov-20 02:53:48 GMT; path=/; domain=.threatminer.org; HttpOnly
access-control-allow-origin: *
cf-cache-status: DYNAMIC
expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
server: cloudflare
cf-ray: 5324423aaf9be6fc-EWR

{"status_code":"200","status_message":"Results found.","results":["www.vwrm.com","mail.vwrm.com"]}

Screenshots [If applicable, screenshots to help explain the problem.]

Additional Context [Any other context about the problem.]

cam-barts commented 4 years ago

Unable to reproduce (see output). Some context, did you try running the module multiple times from recon? 403 seems like a weird error to throw here, like you need a key or something, and if it was hitting a rate limit, you should get a 429. I also see your mocking a windows 10 user agent. That shouldn't matter at all, but did you try with the normal UA and it not work?

[recon-ng][default] > modules load threatminer
[*] Analytics disabled.
[recon-ng][default][threatminer] > options set SOURCE vwrm.com
SOURCE => vwrm.com
[recon-ng][default][threatminer] > run

--------
VWRM.COM
--------
[*] ========================= REQUEST =========================
url:    https://api.threatminer.org/v2/domain.php?rt=5&q=vwrm.com
method: GET /v2/domain.php?rt=5&q=vwrm.com
header: User-Agent: Recon-ng/v5
header: Accept-Encoding: gzip, deflate
header: Accept: */*
header: Connection: keep-alive
[*] ========================= RESPONSE =========================
status: 200 OK
header: Date: Fri, 08 Nov 2019 13:57:57 GMT
header: Content-Type: application/json; charset=utf-8
header: Transfer-Encoding: chunked
header: Connection: keep-alive
header: Set-Cookie: __cfduid=dfca5f54c2d81dc8f651d22cfdf6467851573221477; expires=Sat, 07-Nov-20 13:57:57 GMT; path=/; domain=.threatminer.org; HttpOnly
header: Access-Control-Allow-Origin: *
header: CF-Cache-Status: DYNAMIC
header: Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
header: Server: cloudflare
header: CF-RAY: 53280f1a2cfd23f4-IAD
header: Content-Encoding: gzip
body:   b'{"status_code":"200","status_message":"Results found.","results":["www.vwrm.com","mail.vwrm.com"]}'
[*] DATABASE => /home/shade/.recon-ng/workspaces/default/data.db
[*] QUERY => INSERT INTO `hosts` (`host`, `module`) SELECT ?, ? WHERE NOT EXISTS(SELECT * FROM `hosts` WHERE `host`=?)
[*] VALUES => ('www.vwrm.com', 'threatminer', 'www.vwrm.com')
[*] [host] www.vwrm.com (<blank>)
[*] DATABASE => /home/shade/.recon-ng/workspaces/default/data.db
[*] QUERY => INSERT INTO `hosts` (`host`, `module`) SELECT ?, ? WHERE NOT EXISTS(SELECT * FROM `hosts` WHERE `host`=?)
[*] VALUES => ('mail.vwrm.com', 'threatminer', 'mail.vwrm.com')
[*] [host] mail.vwrm.com (<blank>)

-------
SUMMARY
-------
[*] 2 total (2 new) hosts found.
[*] DATABASE => /home/shade/.recon-ng/workspaces/default/data.db
[*] QUERY => INSERT OR REPLACE INTO dashboard (module, runs) VALUES ('recon/domains-hosts/threatminer', COALESCE((SELECT runs FROM dashboard WHERE module='recon/domains-hosts/threatminer')+1, 1))

mavensecurity commented 4 years ago

I did not run multiple times. The USER-AGENT made no difference. I am still getting the error. curl and wget from the same system work fine, so it's not as if threatminer endpoint has banned our IP.

Maybe at least the module can be enhanced to not react so harshly to HTTP 403.

I am inside virtualenv in case that makes a difference.

$ pip check flasgger 0.9.3 has requirement jsonschema<3.0.0, but you have jsonschema 3.1.1.

Hmmm. A clue? So I fixed that:

$ pip uninstall jsonschema $ pip install 'jsonschema<3.0.0' $ pip check No broken requirements found.

But I still get the same 403 error.

¯_(ツ)_/¯ I'm out of ideas. So far other modules seem to work fine.

lanmaster53 commented 4 years ago

A 403 just really doesn't make any sense in this context. The module doesn't require a key, yet the application is saying unauthorized? That would mean they are using something else to determine authorization. User Agent? IP address? You tried changing the user agent, but perhaps they banned your IP, and are blanketing you with 403s. If that's the case, it's likely that the module dev never encountered that scenario and therefore never accounted for it. Just a guess.

PRs welcome for fixes.

mavensecurity commented 4 years ago

I can use curl and wget fine from that same server to that same end point. So it's not the IP getting banned. Changing the User-Agent has no effect. With curl and wget its HTTP 200's all the way down. That module: 403. Dunno why. Next step might be to direct all module traffic via a MITM proxy to inspect all the things.

lanmaster53 commented 4 years ago

Just set the proxy global option in Recon-ng and inspect there. You can curl through a proxy as well.

lanmaster53 commented 4 years ago

Based on another broken module I found today, I think I know what is causing this. This server is behind Cloudflare and Cloudflare is triggering on requests coming from Recon-ng. If you view the error response in verbose mode or run it through a proxy, you'll see the 403 requesting you to answer a captcha. This could be an issue moving forward for a lot of things.

lanmaster53 / recon-ng-marketplace

threatminer 403 forbidden causes JSON parsing error #77