JSON output sucks. - Githubissues

pickworth commented 9 years ago

Working on something better. Will submit PR soon, with option to enable the 'better' structured output. I've found with the current output, the nesting is pretty bad. (as XML to JSON simply doesn't do a great job, obviously)

It will make storing to DBs much nicer, which is what I am doing. Plan is to return an array of hosts scanned. (so each scanned host will be inserted easily as a database row/record/document/event, etc...) Please let me know if there is anything you want to see there in particular, but I'm not going to skip any fields.

If you have already made something, please share, and I will not bother. Otherwise, I'll share what I have done soon, when I'm finished. (I'm 75% there already)

digipigeon commented 9 years ago

+1

tomtom87 commented 9 years ago

+1

tomtom87 commented 9 years ago

@nmors please can you check these perl scripts I have been using before http://search.cpan.org/~apersaud/Nmap-Parser/tools/nmap2db.pl

tomtom87 commented 9 years ago

@nmors also this project is very interesting if you are going to do a PR.. this is the postgre python nmap manager https://github.com/rafaelma/nmap2db/blob/master/docs/manual.rst

jas- commented 9 years ago

This shouldn't be handling database connections in anyway. Any database operations should be in a different abstraction layer then this module.

pickworth commented 9 years ago

Thanks, I was not planning on handling db connections here at all, just simply better formatting the json output. On 15 Oct 2015 4:56 am, "Jason Gerfen" notifications@github.com wrote:

This shouldn't be handling database connections in anyway. Any database operations should be in a different abstraction layer then this module.

— Reply to this email directly or view it on GitHub https://github.com/jas-/node-libnmap/issues/23#issuecomment-148135818.

jas- commented 9 years ago

+1 I think the XLST used by nmap core might be at fault but changing that might prove difficult

pickworth commented 9 years ago

This is a small part of the code - (it's obviously not complete yet - I need to properly loop through the scanned hosts still, and build an array properly as I accidentally did it wrong in the following code) Just want to get some opinions if possible, before going further. E.g. field naming convention, dash / underscore / camelCase, etc. I was going to convert to camelCase. Thoughts?

It's basically an array of flat objects (with the exception of the open ports which will be the only nested object)

Object.keys(doc).forEach(function(the_host) {   
  log("parsing scan results for server:" + the_host);
  var open_ports = [];
  for (var i = 0; i < doc[the_host].host[0].ports[0].port.length; i++) {
    log("parsing open port #" + i + " for server:" + the_host);
    open_ports.push({
      "protocol" :    doc[the_host].host[0].ports[0].port[i]['$'].protocol,
      "port-number" : doc[the_host].host[0].ports[0].port[i]['$'].portid,
      "state" :       doc[the_host].host[0].ports[0].port[i].state[0]['$'].state,
      "reason" :      doc[the_host].host[0].ports[0].port[i].state[0]['$'].reason,
      "reason-ttl" :  doc[the_host].host[0].ports[0].port[i].state[0]['$'].reason_ttl,
      "service" :     doc[the_host].host[0].ports[0].port[i].service[0]['$'].name,
      "method" :      doc[the_host].host[0].ports[0].port[i].service[0]['$'].method,
      "conf" :        doc[the_host].host[0].ports[0].port[i].service[0]['$'].conf
    });
  };

  output_array.push({
    "host":                           the_host,
    "scanner":                        doc[the_host]['$'].scanner,
    "args":                           doc[the_host]['$'].args,
    "scan-start-timestamp":           doc[the_host]['$'].start,
    "scan-start-timestamp-pretty":    doc[the_host]['$'].startstr,
    "nmap-version":                   doc[the_host]['$'].version,
    "xml-output-version":             doc[the_host]['$'].xmloutputversion,
    "scan-end-timestamp":             doc[the_host].runstats[0].finished[0]['$'].time,
    "scan-end-timestamp-pretty":      doc[the_host].runstats[0].finished[0]['$'].timestr,
    "scan-duration":                  doc[the_host].runstats[0].finished[0]['$'].elapsed,
    "scan-summary":                   doc[the_host].runstats[0].finished[0]['$'].summary,
    "scan-exitcode":                  doc[the_host].runstats[0].finished[0]['$'].exit,
    "hosts-up":                       doc[the_host].runstats[0].hosts[0]['$'].up,
    "hosts-down":                     doc[the_host].runstats[0].hosts[0]['$'].down,
    "hosts-total":                    doc[the_host].runstats[0].hosts[0]['$'].total,
    "scan-type":                      doc[the_host].scaninfo[0]['$'].type,
    "scan-protocol":                  doc[the_host].scaninfo[0]['$'].protocol,
    "scan-numservices":               doc[the_host].scaninfo[0]['$'].numservices,
    "scan-services":                  doc[the_host].scaninfo[0]['$'].services,
    "host-scan-start":                doc[the_host].host[0]['$'].starttime,
    "host-scan-end":                  doc[the_host].host[0]['$'].endtime,
    "host-state":                     doc[the_host].host[0].status[0]['$'].state,
    "host-reason":                    doc[the_host].host[0].status[0]['$'].reason,
    "host-reason-ttl":                doc[the_host].host[0].status[0]['$'].reason_ttl,
    "host-address":                   doc[the_host].host[0].address[0]['$'].addr,
    "host-address-type":              doc[the_host].host[0].address[0]['$'].addrtype,
    "hostname":                       doc[the_host].host[0].hostnames[0].hostname[0]['$'].name,
    "hostname-type":                  doc[the_host].host[0].hostnames[0].hostname[0]['$'].type,
    "num-open-ports":                 open_ports.length,
    //"no-open-ports":                doc[the_host].host[0].ports[0].port.length,
    "ports":                          open_ports
  });

});

jas- commented 9 years ago

No on the camel case. The hyphens are fine as keys and preferred. I am concerned about the index 0 per xml element. Each element; i.e. host-address, you have listed as doc[the_host].host[0].address[0].

Because this tool speeds up scans of large subnet/network blocks each report may contain multiple hosts and that goes for each element; i.e. doc[the_host].host[0] or doc[the_host].scaninfo[0] etc.

I mention this because the recursive function to essentially remap the fields would have to account for exceptions such as those elements using $

pickworth commented 9 years ago

Thanks Jason, you are completely correct. I implemented the above when only scanning a single host, not understanding properly how the output will look like when scanning multiple, also using multiple host blocks.

Need to handle pushing to the output array for each host and/or each host in each host block. An additional loop is required here to fix it, i guess.

Also need to decide if this should be default output for json, or be as an additional output type. I think, for the short term at least it should not be the default so it does not break existing implementations. Up to you of course.

Nathan On 15 Oct 2015 10:27 am, "Jason Gerfen" notifications@github.com wrote:

No on the camel case. The hyphens are fine as keys and preferred. I am concerned about the index 0 per xml element. Each element; i.e. host-address, you have listed as doc[the_host].host[0].address[0].

Because this tool speeds up scans of large subnet/network blocks each report may contain multiple hosts and that goes for each element; i.e. doc[the_host].host[0] or doc[the_host].scaninfo[0] etc.

I mention this because the recursive function to essentially remap the fields would have to account for exceptions such as those elements using $

— Reply to this email directly or view it on GitHub https://github.com/jas-/node-libnmap/issues/23#issuecomment-148232038.

jas- commented 9 years ago

In regards to contributing; the best practice is as follows:

Fork this repository
Create an upstream branch to handle future releases (weekly fetches of the uptream branch and merging will keep any local fork branches up to date). ex: git remote add upstream https://github.com/jas-/node-libnmap.git & git fetch upstream && git merge
Create a new branch ex: git checkout -b json-output
Add code, test cases etc.

Once your finished create a pull request.

One thing about your pr, try to squash multiple commits into one so all changes can be evaluated easily.

I need to add a contributing document to help facilitate this process for others but until then just ask and I can help.

tomtom87 commented 9 years ago

Ideal situation for me would be able to just choose output method DBMS and save to MySQL, NoSQL, Postgre etc. Not quiet sure why I need the XML hop because it's just going str8 to a parser anyways... but each to they own. Guess I will start the fork then... bless

jas- commented 9 years ago

@tomtom87 XML is an output option but JSON is the default output. File I/O is always going to be faster for read/write operations and the object traversal for both XML and JSON is faster than the overhead most SQL engines implement.

While this is one argument against outputting results to a database engine such as the ones you mention, two additional would be the added setup complexity (setup, connect, install & configure) a relational table design would require but it also it goes against the UNIX philosophy of 'do one thing well', which serves nodejs module development well.

pickworth commented 9 years ago

@tomtom87 the purpose of my idea is to ultimately make it easier for one to work with the json output.

Having a nicer structured output like this would mean a simple one-line of code for then saving the results to a nosql db.

For the case of a relational db, you would probably need two tables: hosts & open-ports (1 to many)

If you really feel the need, I think using the module as a dep for another module would be better than a fork imo.

Nathan

@tomtom87 https://github.com/tomtom87 XML is an output option but JSON is the default output. File I/O is always going to be faster for read/write operations and the object traversal for both XML and JSON is faster than the overhead most SQL engines implement.

While this is one argument against outputting results to a database engine such as the ones you mention, two additions are the added setup complexity (setup, connect, install & configure) a relational table design would require but it also it goes against the UNIX philosophy of 'do one thing well' which serves nodejs module development well.

— Reply to this email directly or view it on GitHub https://github.com/jas-/node-libnmap/issues/23#issuecomment-148349815.

jas- commented 9 years ago

@nmors +1 on the module proposition for @tomtom87.

This module would best be served as a dependency from another module that performs the database abstraction you wish to see. See some existing projects that use node-libnmap as a dependency

And the nmap binary would require XML parsing either way as it is the best/standardized format available from the nmap output options.

tomtom87 commented 9 years ago

@jas- I will look at making it as a dependency module then maybe node-libnmap2db? Regarding the I/O that wouldn't be a problem because I can just process the outputted XML on the fly. To do this task it requires a nice wrapper so that you can just leave it running as a service ideally whilst having a table populated.

In respect to having a one to many table design, I don't see the need when the schema that is currently used on the ancient perl script is working fine. I just wanted to make this all work inside one command, to keep with the same philosophy of doing one thing well - interpretations can be a bitch I see ;)

jas- commented 9 years ago

@tomtom87 it isn't interpretation I am worried about. It is thread hijacking to promote other projects out of the scope of the op's proposal.

jas- commented 9 years ago

@nmors Just so you are aware of the xslt file being used to generate the xml output.

There may be better options than using the current xml2js module for the json object generation. Before you write a full blown replacement perhaps fork, new branch and try one of the other xml to json conversion modules available?

I must admit I did very little research on the other available conversion modules.

Also I know have published v0.2.11 which includes a robust0 CONTRIBUTING document to help facilitate pull requests.

ckiely91 commented 8 years ago

@nmors, this is great. Something to consider is the use of the $ character - attempting to save the results to a MongoDB results in errors as $ is a Mongo operator. Could this be replaced or omitted?

jas- commented 8 years ago

Try v0.2.21

pickworth commented 8 years ago

@ckiely91 yep, $ is getting omitted On 3 Nov 2015 10:39 pm, "Jason Gerfen" notifications@github.com wrote:

Try v0.2.21

— Reply to this email directly or view it on GitHub https://github.com/jas-/node-libnmap/issues/23#issuecomment-153326927.

jas- commented 8 years ago

@nmors @ckiely91 Have either of you used npm install libnmap to make use of v0.2.21? It should address this.

pickworth commented 8 years ago

Not yet, ive been on holidays and busy with other projects. I'll try it out when i have a chance in next couple of weeks some time On 27 Nov 2015 5:53 am, "Jason Gerfen" notifications@github.com wrote:

@nmors https://github.com/nmors @ckiely91 https://github.com/ckiely91 Have either of you used npm install libnmap to make use of v0.2.21? It should address this.

— Reply to this email directly or view it on GitHub https://github.com/jas-/node-libnmap/issues/23#issuecomment-159976881.

ckiely91 commented 8 years ago

I have updated to v0.2.21, the output is indeed easier to wrangle into Mongo now. Thanks!

jas- commented 8 years ago

Please use v0.2.22 of libnmap. Usage of node-libnmap has been depreciated and has reached EOL.

jas- / node-libnmap

JSON output sucks. #23