mithrandie / csvq

SQL-like query language for csv
https://mithrandie.github.io/csvq
MIT License
1.5k stars 65 forks source link

STDIN does not allow to specify the import format. #41

Closed shakiyam closed 3 years ago

shakiyam commented 3 years ago

This is an enhancement request. The documentation says, "The stdin table loads data from pipe or redirection as a csv data." I would like to specify the import format for standard input.

$ cat sample.tsv  # TSV File
col1    col2
a       b
aaaaaaaaaa      b
$ od -t x1 sample.tsv
0000000 63 6f 6c 31 09 63 6f 6c 32 0a 61 09 62 0a 61 61
0000020 61 61 61 61 61 61 61 61 09 62 0a
0000033
$ csvq -f FIXED 'SELECT * FROM sample'  # The format to be loaded is automatically determined by the file extension
col1       col2
a          b
aaaaaaaaaa b
$ cat sample.tsv | csvq -i TSV -f FIXED 'SELECT * FROM STDIN'  # For STDIN, the -i option is ignored.
col1    col2
a       b
aaaaaaaaaa      b
$ cat sample.tsv | tr "\t" , | csvq -f FIXED 'SELECT * FROM STDIN'  # STDIN is loaded as a csv data
col1       col2
a          b
aaaaaaaaaa b
$
mithrandie commented 3 years ago

Maybe it’s a bug. The “i” option can be used for other formats.

As a workaround, for tsv format, the “d” option can be used.

cat sample.tsv | csvq -d “\t” “select *”

I’ll check later. Thank you.

ondohotola commented 3 years ago

Surely one can put a filter into the pipe?

el

On 28/01/2021 10:26, Yuki wrote:

Maybe it’s a big. The “i” option can be used for other formats.

As a workaround, for tsv format, the “d” option can be used.

|cat sample.tsv | csvq -d “\t” “select *”|

I’ll check later. Thank you.[...] -- Dr. Eberhard W. Lisse \ / Obstetrician & Gynaecologist el@lisse.NA / * | Telephone: +264 81 124 6733 (cell) PO Box 8421 Bachbrecht \ / If this email is signed with GPG/PGP 10007, Namibia ;____/ Sect 20 of Act No. 4 of 2019 may apply

shakiyam commented 3 years ago

Thank you for your comment. It works as expected for Fixed-Length Format and JSON, so I will use the workaround you gave me for TSV.

$ cat sample.txt
col1        col2
a           b
aaaaaaaaaa  b
$ cat sample.txt | csvq -i FIXED -f CSV 'SELECT * FROM STDIN'  # '-i FIXED' works as expected.
col1,col2
a,b
aaaaaaaaaa,b
$ cat sample.json
[
    {
        "col1": "a",
        "col2": "b"
    },
    {
        "col1": "aaaaaaaaaa",
        "col2": "b"
    }
]
$ cat sample.json | csvq -i JSON -f CSV 'SELECT * FROM STDIN'  # '-i JSON ' also works as expected.
col1,col2
a,b
aaaaaaaaaa,b
$ cat sample.tsv | csvq -i TSV -f CSV 'SELECT * FROM STDIN'  # '-i TSV' is ignored.
col1    col2
a       b
aaaaaaaaaa      b
$ cat sample.tsv | csvq -d '\t' -f CSV 'SELECT * FROM STDIN'  # workaround
col1,col2
a,b
aaaaaaaaaa,b
mithrandie commented 3 years ago

This bug has been fixed and the fix is included in the version 1.13.8.

mithrandie commented 3 years ago

@ondohotola Sorry, I overlooked that comment. Can you tell me more about what you were concerned about?

ondohotola commented 3 years ago

Yuki,

I have no concerns, I was just saying that the issue of not being able to read TSV from STDINPU can be resolved by way of putting a filter into the pipe. Something like

cat t.tsv|csvformat -t|csvq 'select * from STDIN'

from csvkit

https://csvkit.readthedocs.io/en/latest/scripts/csvformat.html

or similar :-)-O

Thanks,

el

On 2021-01-31 18:14 , Yuki wrote:

@ondohotola https://github.com/ondohotola Sorry, I overlooked that comment. Can you tell me more about what you were concerned about? [...] -- Dr. Eberhard W. Lisse \ / Obstetrician & Gynaecologist el@lisse.NA / * | Telephone: +264 81 124 6733 (cell) PO Box 8421 Bachbrecht \ / If this email is signed with GPG/PGP 10007, Namibia ;____/ Sect 20 of Act No. 4 of 2019 may apply

mithrandie commented 3 years ago

@ondohotola I see, that's a good solution too. Thanks for your comment.