wireservice / csvkit

A suite of utilities for converting to and working with CSV, the king of tabular file formats.
https://csvkit.readthedocs.io
MIT License
6.03k stars 603 forks source link

conversion to TSV without double doublequotes (") #1194

Closed ciupicri closed 1 year ago

ciupicri commented 1 year ago

If I try to convert to TSV a CSV containing values with doublequotes ("), e.g. this file:

Software,Year
"The ""best"" software",2023

csvformat --out-tabs --out-no-doublequote fails with:

Error: need to escape, but no escapechar set

One liner for testing:

printf 'Software,Year\n"The ""best"" software",2023\n' | csvformat -T -B
jpmckinney commented 1 year ago

You need to set -P:

  -P OUT_ESCAPECHAR, --out-escapechar OUT_ESCAPECHAR
                        Character used to escape the delimiter in the output CSV file if --quoting 3 ("Quote None") is specified and to escape the QUOTECHAR if --no-doublequote is specified.

Specifically: to escape the QUOTECHAR if --no-doublequote is specified.

Edit: Fixed typo 71210e6 as -P should mention --out-no-doublequote not --no-doublequote.

ciupicri commented 1 year ago

But I don't want to escape double qoutes because it doesn't make sense for TSV.

printf 'Software,Year\n"The ""best"" software",2023\n' | csvformat -T -B -P 'X' outputs:

Software    Year
The X"bestX" software   2023

when the ideal output is:

Software    Year
The "best" software 2023

Just in case, -P '' gives me:

TypeError: "escapechar" must be a 1-character string

jpmckinney commented 1 year ago

"need to escape, but no escapechar set" comes from Python's own csv module.

You can change the quote character to one that doesn't occur in the text.

e.g. csvformat -T -Q~ or csvformat -T -Q"🦀"

ciupicri commented 1 year ago

This stuff ought to be in the manual. By the way the csvformat -u 3 -U 3 -Q "" example is broken.

jpmckinney commented 1 year ago

How so?

Doing:

csvformat -u3 examples/optional_quote_characters.csv

causes:

a,b,c
"""1""","""2""","""3"""

which looks horrible. Doing the documented incantation produces this instead:

a,b,c
"1","2","3"
ciupicri commented 1 year ago
# csvformat -u 3 -U 3 -Q ""
No input file or piped data provided. Waiting for standard input:
TypeError: "quotechar" must be a 1-character string
jpmckinney commented 1 year ago

You’re not providing any input, as communicated in the output

ciupicri commented 1 year ago
$ printf 'Software,Year\n"The ""best"" software",2023\n' | csvformat -u 3 -U 3 -Q ""
TypeError: "quotechar" must be a 1-character string
jpmckinney commented 1 year ago

What's your python --version? I get no error.

jpmckinney commented 1 year ago

You can also run with --verbose (-v) for the traceback.

jpmckinney commented 1 year ago

Aha: "Changed in version 3.11: An empty quotechar is not allowed." https://docs.python.org/3/library/csv.html#dialects-and-formatting-parameters I've updated the example to be -Q🐍

ciupicri commented 1 year ago

For what it's worth I'm using: