mithrandie / csvq

SQL-like query language for csv
https://mithrandie.github.io/csvq
MIT License
1.49k stars 65 forks source link

how to query a SPACE delimited multi-column file #95

Open jackywu opened 1 year ago

jackywu commented 1 year ago

how to query a SPACE delimited multi-column file , each column is not fixed width, such as following

  40908 26348 chrome
  40911 10432 chrome
  40912  1156 chrome
  40972 1662332 chrome
  41014     0 kworker/u8:2-zswap1
  43415     0 kworker/u8:4-events_unbound

thanks

ondohotola commented 1 year ago

Using the following file with same content

cat > t.tsv <<EOF 40908 26348 chrome 40911 10432 chrome 40912 1156 chrome 40972 1662332 chrome 41014 0 kworker/u8:2-zswap1 43415 0 kworker/u8:4-events_unbound EOF

this

mlr --ipprint --ocsv cat t.tsv | csvq -n 'SELECT * FROM stdin'

results in

+--------+----------+------------------------------+
|   c1   |    c2    |              c3              |
+--------+----------+------------------------------+
| 40908  | 26348    | chrome                       |
| 40911  | 10432    | chrome                       |
| 40912  | 1156     | chrome                       |
| 40972  | 1662332  | chrome                       |
| 41014  | 0        | kworker/u8:2-zswap1          |
| 43415  | 0        | kworker/u8:4-events_unbound  |
+--------+----------+------------------------------+

There are tons of little tools out there to slice and dice files

el

On 31/12/2022 16:52, jacky wrote:

how to query a SPACE delimited multi-column file , each column is not fixed width, such as following

40908 26348 chrome 40911 10432 chrome 40912 1156 chrome 40972 1662332 chrome 41014 0 kworker/u8:2-zswap1 43415 0 kworker/u8:4-events_unbound |

thanks [...

-- Dr. Eberhard W. Lisse \ / Obstetrician & Gynaecologist @.** / | Telephone: +264 81 124 6733 (cell) PO Box 8421 Bachbrecht \ / If this email is signed with GPG/PGP 10007, Namibia ;____/ Sect 20 of Act No. 4 of 2019 may apply

kpym commented 1 year ago

I don't think that csvq allows to use multiple spaces as delimiter. As @ondohotola says you can use many tools to replace multiple spaces by comma. Probably the most popular is sed.

 sed 's/ \{1,\}/,/g' t.tsv | csvq -n "select *"
+--------+----------+------------------------------+
|   c1   |    c2    |              c3              |
+--------+----------+------------------------------+
| 40908  | 26348    | chrome                       |
| 40911  | 10432    | chrome                       |
| 40912  | 1156     | chrome                       |
| 40972  | 1662332  | chrome                       |
| 41014  | 0        | kworker/u8:2-zswap1          |
| 43415  | 0        | kworker/u8:4-events_unbound  |
+--------+----------+------------------------------+

But you can also use something faster and simpler like sd.

sd -p ' +' ',' t.tsv | csvq -n "select *"
+--------+----------+------------------------------+
|   c1   |    c2    |              c3              |
+--------+----------+------------------------------+
| 40908  | 26348    | chrome                       |
| 40911  | 10432    | chrome                       |
| 40912  | 1156     | chrome                       |
| 40972  | 1662332  | chrome                       |
| 41014  | 0        | kworker/u8:2-zswap1          |
| 43415  | 0        | kworker/u8:4-events_unbound  |
+--------+----------+------------------------------+

You can also use the standard tr to squeeze multiple spaces to single one and then use csvq with -d ' ' (space as delimiter).

cat t.tsv | tr -s ' ' | csvq -n -d ' ' 'select *'
+--------+----------+------------------------------+
|   c1   |    c2    |              c3              |
+--------+----------+------------------------------+
| 40908  | 26348    | chrome                       |
| 40911  | 10432    | chrome                       |
| 40912  | 1156     | chrome                       |
| 40972  | 1662332  | chrome                       |
| 41014  | 0        | kworker/u8:2-zswap1          |
| 43415  | 0        | kworker/u8:4-events_unbound  |
+--------+----------+------------------------------+