rapid7 / recog

Pattern recognition for hosts, services, and content
Other
662 stars 195 forks source link

How can I match a fingerprint against HTML body? #469

Open szvvo opened 2 years ago

szvvo commented 2 years ago

As the question states. is there any way to match against an HTML body.

Can I use something like this? :

<fingerprints matches="html_body" protocol="http" database_type="service" preference="0.90">`

mkienow-r7 commented 2 years ago

@stoneface919 We do not currently have a database for matching against HTML bodies. Regular expressions are not ideal here. We have plans to explore XPath based solutions in the future.

szvvo commented 2 years ago

I see now thank you for your reply. Do you happen to know if this recog is being used on InsightVM?

mkienow-r7 commented 2 years ago

Yes, recog is in use in InsightVM. Checkout this older blog post, Self-Isolation, Home Networking, and Open Source: Recog and Rumble, which mentioned this detail and other products using the recog project!

jvoisin commented 8 months ago

XPath wouldn't be able to match things like this:

<script type="text/javascript">//<![CDATA[
        FbxConf = {};
        FbxConf.apiBaseUrl = '/api/latest/';
        FbxConf.uploadBaseUrl = '/api/latest/upload/';
        FbxConf.websocketBaseUrl =
            (document.location.protocol == "https:" ? "wss://" : "ws://") +
            document.location.host + '/api/latest/ws/';
        FbxConf.csrfToken = '';
        FbxConf.firmwareVersionMajor = '4';
        FbxConf.firmwareVersionMinor = '7';
        // […]
//]]></script>

it's from the landing page of a freebox, leaking its major and minor version.

dabdine commented 5 months ago

XPath wouldn't be able to match things like this:

<script type="text/javascript">//<![CDATA[
        FbxConf = {};
        FbxConf.apiBaseUrl = '/api/latest/';
        FbxConf.uploadBaseUrl = '/api/latest/upload/';
        FbxConf.websocketBaseUrl =
            (document.location.protocol == "https:" ? "wss://" : "ws://") +
            document.location.host + '/api/latest/ws/';
        FbxConf.csrfToken = '';
        FbxConf.firmwareVersionMajor = '4';
        FbxConf.firmwareVersionMinor = '7';
        // […]
//]]></script>

it's from the landing page of a freebox, leaking its major and minor version.

XPath could be useful for filtering the document before passing it to a pattern for evaluation. Something like:

<fingerprint pattern="FbxConf.firmwareVersionMajor = '(\d+)'\s+FbxConf.firmwareVersionMinor = '(\d+)'" flags="REG_MULTILINE">
  <content-filter xpath="//script[@type='text/javascript']"/>
  <description>Freebox...</description>
  <example><![CDATA[
      <script type="text/javascript">
              FbxConf = {};
              FbxConf.apiBaseUrl = '/api/latest/';
              FbxConf.uploadBaseUrl = '/api/latest/upload/';
              FbxConf.websocketBaseUrl =
                  (document.location.protocol == "https:" ? "wss://" : "ws://") +
                  document.location.host + '/api/latest/ws/';
              FbxConf.csrfToken = '';
              FbxConf.firmwareVersionMajor = '4';
              FbxConf.firmwareVersionMinor = '7';
              // […]
      </script>
    ]]>
  </example>
  <param pos="1" name="ver.maj"/>
  <param pos="2" name="ver.min"/>
  <param pos="0" name="service.version" value="{ver.maj}.{ver.min}" />
</fingerprint>

Then evaluate any input HTML doc against the content-filter. For each Xpath match, run the pattern, maybe bailing after the first pattern match. Interesting in that it could be helpful to avoid matching over the entire document...