Closed lukaspirpamer closed 1 month ago
Why?
Hi James, thanks for your prompt reply!
Because the grouping option converts the delimiter to be "," For example, when using csvstack and the delimiter is ";" in the input csv files, in the output file it will be converted to "," when the grouping option is applied. I would have expected the same behaviour as without using the grouping option. What do you think?
Best, Lukas
Can you provide a sample command, with sample input?
All CSV Kit commands assume that comma is used as the delimiter, except for in2csv.
If you do the following, the semi-colons are preserved only because csvstack
considers them to be part of the data, rather than considering them delimiters:
$ printf 'a;b;c\n1;2;3' | csvstack
a;b;c
1;2;3
You can set a custom delimiter with -d
:
$ printf 'a;b;c\n1;2;3' | csvstack -d ';'
a,b,c
1,2,3
You'll see that, now, csvstack understands that ;
is the delimiter, and therefore uses comma in the output.
To get output that uses a different delimiter, you must use csvformat.
The reason for this design decision, is that all tools use a common format, and only in2csv controls modifying the input format (along with options like -d
), and only csvformat controls modifying the output format. This avoids having to continuously reconfigure the input/output in every single command, when piping output between commands.
Basically, if you are currently doing csvstack a.csv lot.csv of.csv files.csv that.csv use.csv semicolons.csv
, then you are effectively doing the same as cat ...
. csvstack doesn't recognize the semicolons as delimiters, unless you use -d
(in which case, the output will use commas, as described above).
When the grouping option is enabled, "," is used as a field delimiter and the delimiter of the input csv file is ignored.
Would it be possible to use the automatically determined delimiter or delimiter of the csv-input file?
https://github.com/wireservice/csvkit/blob/f73742fc0ec4c993b5f76809ee15dfab8a0cef10/csvkit/utilities/csvstack.py#L50