johnkerl / miller

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
https://miller.readthedocs.io
Other
8.86k stars 212 forks source link

Double-width characters spoil column alignment #1520

Open smammy opened 6 months ago

smammy commented 6 months ago

Test case:

printf '%s\t%s\n' \
    sample width \
    $'\uFF21\uFF22\uFF23\uFF24\uFF25' double \
    ABCDE single \
> example.tsv
mlr --from example.tsv --t2p cat

Expected result:

image

Actual result:

image

References:

johnkerl commented 6 months ago

Perhaps #379 and this would have the same fix ...

smammy commented 6 months ago

Perhaps #379 and this would have the same fix ...

I bet it would!

smammy commented 6 months ago

util-linux' column command uses wcwidth to calculate number of columns the character will use. Found StringWidth in uniseg; maybe that would be useful?

smammy commented 6 months ago

Specifically, here?

https://github.com/johnkerl/miller/blob/8d6455dfab557f1ba913186bae00b1d68e0dd4a7/pkg/output/record_writer_pprint.go#L108C13-L108C35