neekey / table-parser

A parser to parse table style output from shell
MIT License
25 stars 13 forks source link

Fix parser on OSX #3

Closed IstoraMandiri closed 8 years ago

IstoraMandiri commented 8 years ago

There seemed to be a problem with ps-node due to the PID not being returned.

This is because, on OSX, the table looks like this:

  PID TTY           TIME CMD
 4773 ttys000    0:00.30 -/bin/zsh
13225 ttys001    0:00.16 -/bin/zsh
13435 ttys002    0:00.11 -/bin/zsh

Notice that the PID starts with two spaces, but titleBegin was being set to 2 instead of 0.

The fix may be naive but it seems to work on OSX. Hopefully this doesn't break it on other OS.

IstoraMandiri commented 8 years ago

It might be better to totally re-work the logic in case other fields are right aligned like PID and TIME... ?

neekey commented 8 years ago

@hitchcott The problem about the current version of table-parser is that a column can be a boundary only if all the values has a begin or end at this column, so the different length of PID in your table just break this logic:

works:

     PID
  123445
  123456
  123144

oops...

    PID
    3445
  123456
  123144

I'm thinking about rewrite the logic to treat all the values in the same field as a connected domain.

First to recognize them

  PID        CMD
AAAAA       BBBBBB       
  AAAAA     BBBB
AAAAAA     BBBB
  AAAAA     BBBBBBB

Second to merge domains vertically overlapping in case there is empty value:

  PID        CMD
AAAAA       BBBBBB       
  AAAAA     BBBB
AAAAAA     BBBB
  AAAAA     BBBBBBB
         BBBBBBB
AAAAAA    BBBBBBB
AAAAAA    BBBBBBB
AAAAAA    BBBBBBB

And after that we can get the minimum and maximum "x" index as boundaries.

this should work in most cases.

IstoraMandiri commented 8 years ago

@neekey Noted and agreed - this PR is not a true fix.

I may have time in a the new couple of weeks to implement that logic (in the meantime I'm using this fix for my project as it worked for what I need), but if you have time I'm sure you can do it quicker!

nfantone commented 8 years ago

Any updates on this? I arrived at this issue after having the same problem on OS X. PIDs for my process are not being returned some times.

Here's my case:

PID TTY           TIME CMD
  555 ttys000    0:00.02 /Applications/iTerm.app/Contents/MacOS/iTerm2 --server login -fp nfantone
  557 ttys000    0:00.49 -zsh
 4856 ttys000    0:12.85 /usr/bin/java -Xms256m -Xmx1g -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -Djna.nosys=true -Des.path.home=/usr/local/Cellar/elasticsearch/2.1.1/libexec -cp /usr/local/Cellar/elasticsearch/2.1.1/libexec/lib/elasticsearch-2.1.1.jar:/usr/local/Cellar/elasticsearch/2.1.1/libexec/lib/* org.elasticsearch.bootstrap.Elasticsearch start -d

First and second entries are correctly parsed, but third isn't.

nfantone commented 8 years ago

Also, why is a "table parser" needed for this? If you assume that the only column with more than one word is CMD, then just split each line by /\s+/ after trimming, associate each one with one element of the header line and join the rest.

So, you end up with something like this:

var s = `PID TTY           TIME CMD
  555 ttys000    0:00.02 /Applications/iTerm.app/Contents/MacOS/iTerm2 --server login -fp nfantone
  557 ttys000    0:00.49 -zsh
 4856 ttys000    0:12.85 /usr/bin/java -Xms256m -Xmx1g -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -Djna.nosys=true -Des.path.home=/usr/local/Cellar/elasticsearch/2.1.1/libexec -cp /usr/local/Cellar/elasticsearch/2.1.1/libexec/lib/elasticsearch-2.1.1.jar:/usr/local/Cellar/elasticsearch/2.1.1/libexec/lib/* org.elasticsearch.bootstrap.Elasticsearch start -d`

var header = s.split('\n')[0].split(/\s+/); // ["PID", "TTY", "TIME", "CMD"]
var lines = s.split('\n').splice(1); // Everything but the header.
lines.forEach(function(l) { 
  var cells = l.trim().split(/\s+/);
  var p = {}, i = 0; 
  for (; i < header.length - 1;  i++) { p[header[i]] = cells[i] };
  p[header[header.length - 1]] = cells.splice(header.length - 1).join(' ');
  console.log(p); 
});

That'll print out:

Object {PID: "555", TTY: "ttys000", TIME: "0:00.02", CMD: "/Applications/iTerm.app/Contents/MacOS/iTerm2 --server login -fp nfantone"}

Object {PID: "557", TTY: "ttys000", TIME: "0:00.49", CMD: "-zsh"}

Object {PID: "4856", TTY: "ttys000", TIME: "0:12.85", CMD: "/usr/bin/java -Xms256m -Xmx1g -Djava.awt.headless=…rg.elasticsearch.bootstrap.Elasticsearch start -d"}

Which is exactly what you want and it is completely agnostic of the original formatting or column alignment.

neekey commented 8 years ago

@hitchcott connected-domain is implemented for table-parser, this should work in most cases. Sorry for the long wait.

@nfantone table-parser was written for ps-node, but I also want it to be standalone, see the test cases for more complicated user cases.