raine / ramda-cli

:ram: A CLI tool for processing data with functional pipelines
ISC License
573 stars 12 forks source link

Suggestion: add option to ignore CSV/TSV column header mismatch. #15

Closed mmqmzk closed 5 years ago

mmqmzk commented 5 years ago

Sometimes it is useful to ignore some CSV/TSV syntax errors. The ps command output some headers separated by only one space, so I can't split the output by / +/, but / +/, then some lines get more fields than the headers. If ramda-cli can ignore those fields, then we can use ramda with ps and more commands.

ps aux | sed -E 's/ +/,/g'

USER,PID,%CPU,%MEM,VSZ,RSS,TTY,STAT,START,TIME,COMMAND 
root,1,0.0,0.0,8324,156,?,Ss,15:55,0:00,/init,ro
root,3,0.0,0.0,8332,164,tty1,Ss,15:55,0:00,/init,ro
zhoukun,4,0.1,0.0,21096,6792,tty1,S,15:55,0:03,zsh
zhoukun,127,0.0,0.0,0,0,tty1,Z,15:56,0:00,[tr],<defunct>
root,692,0.0,0.0,8332,164,tty2,Ss,16:16,0:00,/init,ro
zhoukun,693,0.4,0.0,21096,6868,tty2,S,16:16,0:02,zsh
root,1037,0.0,0.0,8332,164,tty3,Ss,16:17,0:00,/init,ro
zhoukun,1042,1.3,0.0,21228,7176,tty3,S,16:17,0:06,zsh 
zhoukun,2821,0.0,0.0,0,0,tty3,Z,16:22,0:00,[sed],<defunct>
zhoukun,3223,0.0,0.0,17400,1864,tty3,R,16:25,0:00,ps,aux
zhoukun,3224,0.0,0.0,14668,1040,tty3,S,16:25,0:00,sed,-E,s/,+/,/g
ps aux | sed -E 's/ +/,/g' | ramda -i csv 'id'

Error: Unexpected Error: column header mismatch expected: 11 columns got: 12

I suggest adding an option named unstrict mode to process CSV/TSV data, if a row has more fields than the headers, the extra fields will be ignored, and if fewer fields will consider as empty string fields.

mmqmzk commented 5 years ago

Ramda-cli depends on fast-csv, I took a look into fast-csv, there are already parse options to ignore column mismatch, such as ignoreEmpty, discardUnmappedColumns, strictColumnHandling.

mmqmzk commented 5 years ago

18