robertdavidgraham / masscan

TCP port scanner, spews SYN packets asynchronously, scanning entire Internet in under 5 minutes.
GNU Affero General Public License v3.0
23.25k stars 3.04k forks source link

[suggestion] Normalize structure of all output formats #612

Open postmodern opened 3 years ago

postmodern commented 3 years ago

It's inconsistent that the structure of the simple list (-oL) output format is a time series of events or results, but the structure of the XML and JSON output appears to be slightly nested. I propose that the structure of all outputs be normalized to be a simple list of events or results, which have a type, a timestamp, and additional attributes based on the type. Furthermore, it doesn't really make sense to provide "nmap similar" output if it's not fully compatible with the nmap XML DTD. Masscan should have it's own XML format, which is similar in structure to the other output formats.

Current Output

-oL

#masscan
open tcp 443 93.184.216.34 1629960470
open tcp 80 93.184.216.34 1629960470
open icmp 0 93.184.216.34 1629960470
banner tcp 443 93.184.216.34 1629960472 ssl TLS/1.1 cipher:0xc013, www.example.org, www.example.org, example.com, example.edu, example.net, example.org, www.example.com, www.example.edu, www.example.net

-oX

<?xml version="1.0"?>
<!-- masscan v1.0 scan -->
<nmaprun scanner="masscan" start="1629960654" version="1.0-BETA"  xmloutputversion="1.03">
<scaninfo type="syn" protocol="tcp" />
<host endtime="1629960654"><address addr="93.184.216.34" addrtype="ipv4"/><ports><port protocol="icmp" portid="0"><state state="open" reason="none" reason_ttl="54"/></port></ports></host>
<host endtime="1629960654"><address addr="93.184.216.34" addrtype="ipv4"/><ports><port protocol="tcp" portid="443"><state state="open" reason="syn-ack" reason_ttl="54"/></port></ports></host>
<host endtime="1629960654"><address addr="93.184.216.34" addrtype="ipv4"/><ports><port protocol="tcp" portid="80"><state state="open" reason="syn-ack" reason_ttl="54"/></port></ports></host>
<host endtime="1629960657"><address addr="93.184.216.34" addrtype="ipv4"/><ports><port protocol="tcp" portid="80"><state state="open" reason="response" reason_ttl="54" /><service name="http.server" banner="ECS (sec/973B)"></service></port></ports></host>
...

Note how <ports> contains only one <port> child element. Why? Those could be merged into a single XML element.

-oJ

[
{   "ip": "93.184.216.34",   "timestamp": "1629960621", "ports": [ {"port": 80, "proto": "tcp", "status": "open", "reason": "syn-ack", "ttl": 54} ] }
,
{   "ip": "93.184.216.34",   "timestamp": "1629960621", "ports": [ {"port": 443, "proto": "tcp", "status": "open", "reason": "syn-ack", "ttl": 54} ] }
,
{   "ip": "93.184.216.34",   "timestamp": "1629960622", "ports": [ {"port": 0, "proto": "icmp", "status": "open", "reason": "none", "ttl": 54} ] }
,
{   "ip": "93.184.216.34",   "timestamp": "1629960624", "ports": [ {"port": 80, "proto": "tcp", "service": {"name": "http.server", "banner": "ECS (sec/974D)"} } ] }
,
...

Note how "ports": is an Array, but only contains a single {"port": ...} Hash. Why? Those could be merged into a single JSON Hash.

Proposed Output

XML

<masscan ...>
  <event type="open" protocol="tcp" port="443" ip="93.184.216.34" timestamp="1629960470" />
  ...
  <event type="banner" protocol="tcp" port="443" ip="93.184.216.34" timestamp="1629960472" ssl="TLS/1.1 cipher:0xc013, www.example.org, www.example.org, example.com, example.edu, example.net, example.org, www.example.com, www.example.edu, www.example.net" />
  ...
</masscan>

JSON

 [
  {"type":"open", "protocol":"tcp", "port":443, "ip":"93.184.216.34", "timestamp":1629960470},
  ...
  {"type":"banner", "protocol":"tcp", "port":443, "ip":"93.184.216.34", "timestamp":1629960472, "ssl":"TLS/1.1 cipher:0xc013, www.example.org, www.example.org, example.com, example.edu, example.net, example.org, www.example.com, www.example.edu, www.example.net"},
  ...
 ]

Benefits

mzpqnxow commented 2 years ago

It's inconsistent that the structure of the simple list (-oL) output format is a time series of events or results, but the structure of the XML and JSON output appears to be slightly nested. I propose that the structure of all outputs be normalized to be a simple list of events or results, which have a type, a timestamp, and additional attributes based on the type. Furthermore, it doesn't really make sense to provide "nmap similar" output if it's not fully compatible with the nmap XML DTD. Masscan should have it's own XML format, which is similar in structure to the other output formats.

Current Output

-oL


#masscan

open tcp 443 93.184.216.34 1629960470

open tcp 80 93.184.216.34 1629960470

open icmp 0 93.184.216.34 1629960470

banner tcp 443 93.184.216.34 1629960472 ssl TLS/1.1 cipher:0xc013, www.example.org, www.example.org, example.com, example.edu, example.net, example.org, www.example.com, www.example.edu, www.example.net

-oX


<?xml version="1.0"?>

<!-- masscan v1.0 scan -->

<nmaprun scanner="masscan" start="1629960654" version="1.0-BETA"  xmloutputversion="1.03">

<scaninfo type="syn" protocol="tcp" />

<host endtime="1629960654"><address addr="93.184.216.34" addrtype="ipv4"/><ports><port protocol="icmp" portid="0"><state state="open" reason="none" reason_ttl="54"/></port></ports></host>

<host endtime="1629960654"><address addr="93.184.216.34" addrtype="ipv4"/><ports><port protocol="tcp" portid="443"><state state="open" reason="syn-ack" reason_ttl="54"/></port></ports></host>

<host endtime="1629960654"><address addr="93.184.216.34" addrtype="ipv4"/><ports><port protocol="tcp" portid="80"><state state="open" reason="syn-ack" reason_ttl="54"/></port></ports></host>

<host endtime="1629960657"><address addr="93.184.216.34" addrtype="ipv4"/><ports><port protocol="tcp" portid="80"><state state="open" reason="response" reason_ttl="54" /><service name="http.server" banner="ECS (sec/973B)"></service></port></ports></host>

...

Note how <ports> contains only one <port> child element. Why? Those could be merged into a single XML element.

-oJ


[

{   "ip": "93.184.216.34",   "timestamp": "1629960621", "ports": [ {"port": 80, "proto": "tcp", "status": "open", "reason": "syn-ack", "ttl": 54} ] }

,

{   "ip": "93.184.216.34",   "timestamp": "1629960621", "ports": [ {"port": 443, "proto": "tcp", "status": "open", "reason": "syn-ack", "ttl": 54} ] }

,

{   "ip": "93.184.216.34",   "timestamp": "1629960622", "ports": [ {"port": 0, "proto": "icmp", "status": "open", "reason": "none", "ttl": 54} ] }

,

{   "ip": "93.184.216.34",   "timestamp": "1629960624", "ports": [ {"port": 80, "proto": "tcp", "service": {"name": "http.server", "banner": "ECS (sec/974D)"} } ] }

,

...

Note how "ports": is an Array, but only contains a single {"port": ...} Hash. Why? Those could be merged into a single JSON Hash.

Proposed Output

XML


<masscan ...>

  <event type="open" protocol="tcp" port="443" ip="93.184.216.34" timestamp="1629960470" />

  ...

  <event type="banner" protocol="tcp" port="443" ip="93.184.216.34" timestamp="1629960472" ssl="TLS/1.1 cipher:0xc013, www.example.org, www.example.org, example.com, example.edu, example.net, example.org, www.example.com, www.example.edu, www.example.net" />

  ...

</masscan>

JSON


 [

  {"type":"open", "protocol":"tcp", "port":443, "ip":"93.184.216.34", "timestamp":1629960470},

  ...

  {"type":"banner", "protocol":"tcp", "port":443, "ip":"93.184.216.34", "timestamp":1629960472, "ssl":"TLS/1.1 cipher:0xc013, www.example.org, www.example.org, example.com, example.edu, example.net, example.org, www.example.com, www.example.edu, www.example.net"},

  ...

 ]

Benefits

  • Consistent output structure.

  • Drops support for "nmap similar" (but not compatible) output format and adds support for masscan's own XML format which masscan would control.

    Potential Risks

    This would be a non-backwards-compatible change, so it would need to be held back until work starts on a 2.0.0 release.

Try using -oD instead of -oJ

postmodern commented 2 years ago

-oD does not appear to be documented in the --help or man page. -oD has the same issues as -oJ in that ports: is an Array but only contains one element. Also, my suggest was to normalize the structure of both the XML and JSON outputs so they would be similar.

mzpqnxow commented 2 years ago

-oD does not appear to be documented in the --help or man page. -oD has the same issues as -oJ in that ports: is an Array but only contains one element. Also, my suggest was to normalize the structure of both the XML and JSON outputs so they would be similar.

Yeah unfortunately there are a lot of things that never made it into the man page/readme (and practically nothing in the help output) ...

FWIW, I've contributed output modules in the past- the unicornscan output module, which probably nobody in the world uses- was my doing. Point being rob is pretty open to accepting reasonable PRs. I would bet he would consider merging something like what you're describing as long as it doesn't break old behavior/output (speculating here, based on prior experience)