thousands separator for the --size=bytes option would be very useful

peter-joo commented 3 years ago

OS: Linux 5.12.9-1-MANJARO x86_64 GNU/Linux
lsd --version: lsd 0.20.1
echo $TERM: xterm-256color
echo $LS_COLORS: rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:.tar=01;31:.tgz=01;31:.arc=01;31:.arj=01;31:.taz=01;31:.lha=01;31:.lz4=01;31:.lzh=01;31:.lzma=01;31:.tlz=01;31:.txz=01;31:.tzo=01;31:.t7z=01;31:.zip=01;31:.z=01;31:.dz=01;31:.gz=01;31:.lrz=01;31:.lz=01;31:.lzo=01;31:.xz=01;31:.zst=01;31:.tzst=01;31:.bz2=01;31:.bz=01;31:.tbz=01;31:.tbz2=01;31:.tz=01;31:.deb=01;31:.rpm=01;31:.jar=01;31:.war=01;31:.ear=01;31:.sar=01;31:.rar=01;31:.alz=01;31:.ace=01;31:.zoo=01;31:.cpio=01;31:.7z=01;31:.rz=01;31:.cab=01;31:.wim=01;31:.swm=01;31:.dwm=01;31:.esd=01;31:.jpg=01;35:.jpeg=01;35:.mjpg=01;35:.mjpeg=01;35:.gif=01;35:.bmp=01;35:.pbm=01;35:.pgm=01;35:.ppm=01;35:.tga=01;35:.xbm=01;35:.xpm=01;35:.tif=01;35:.tiff=01;35:.png=01;35:.svg=01;35:.svgz=01;35:.mng=01;35:.pcx=01;35:.mov=01;35:.mpg=01;35:.mpeg=01;35:.m2v=01;35:.mkv=01;35:.webm=01;35:.webp=01;35:.ogm=01;35:.mp4=01;35:.m4v=01;35:.mp4v=01;35:.vob=01;35:.qt=01;35:.nuv=01;35:.wmv=01;35:.asf=01;35:.rm=01;35:.rmvb=01;35:.flc=01;35:.avi=01;35:.fli=01;35:.flv=01;35:.gl=01;35:.dl=01;35:.xcf=01;35:.xwd=01;35:.yuv=01;35:.cgm=01;35:.emf=01;35:.ogv=01;35:.ogx=01;35:.aac=00;36:.au=00;36:.flac=00;36:.m4a=00;36:.mid=00;36:.midi=00;36:.mka=00;36:.mp3=00;36:.mpc=00;36:.ogg=00;36:.ra=00;36:.wav=00;36:.oga=00;36:.opus=00;36:.spx=00;36:.xspf=00;36:

Expected behavior:

It is very hard to quickly interpret/recognize the real file/directory sizes when the --size=bytes option is given:

>lsd --size=bytes --sort=size --reverse`
.rw-r--r-- p p          1  Mon Jun 28 19:12:04 2021  file_1.dat
.rw-r--r-- p p         12  Mon Jun 28 19:12:04 2021  file_2.dat
.rw-r--r-- p p        123  Mon Jun 28 19:12:04 2021  file_3.dat
.rw-r--r-- p p       1234  Mon Jun 28 19:12:04 2021  file_4.dat
.rw-r--r-- p p      12345  Mon Jun 28 19:12:04 2021  file_5.dat
.rw-r--r-- p p     123456  Mon Jun 28 19:12:04 2021  file_6.dat
.rw-r--r-- p p    1234567  Mon Jun 28 19:12:04 2021  file_7.dat
.rw-r--r-- p p   12345678  Mon Jun 28 19:12:04 2021  file_8.dat
.rw-r--r-- p p  123456789  Mon Jun 28 19:12:04 2021  file_9.dat
.rw-r--r-- p p 1234567890  Mon Jun 28 19:12:06 2021  file_10.dat

However the other/similar tool called exa ( https://github.com/ogham/exa ) includes the thousands separator by default:

>exa --bytes --long --sort=size`
.rw-r--r--             1 p 28 Jun 19:12 file_1.dat
.rw-r--r--            12 p 28 Jun 19:12 file_2.dat
.rw-r--r--           123 p 28 Jun 19:12 file_3.dat
.rw-r--r--         1,234 p 28 Jun 19:12 file_4.dat
.rw-r--r--        12,345 p 28 Jun 19:12 file_5.dat
.rw-r--r--       123,456 p 28 Jun 19:12 file_6.dat
.rw-r--r--     1,234,567 p 28 Jun 19:12 file_7.dat
.rw-r--r--    12,345,678 p 28 Jun 19:12 file_8.dat
.rw-r--r--   123,456,789 p 28 Jun 19:12 file_9.dat
.rw-r--r-- 1,234,567,890 p 28 Jun 19:12 file_10.dat

Actual behavior

Extra cognitive load without those thousands separators :(

meain commented 3 years ago

This might not be a good idea. This will cause issues for people who might be using lsd in a script and grepping for the size part. I don't think breaking compatibility with gnu ls here would be a good idea.

peter-joo commented 3 years ago

Well, I really wanted to describe what to achieve, not how to achieve.

Also I agree, a previous ticket was by someone who used awk to parse lsd's output and due to space (or other separators) the parsing has failed: https://github.com/Peltoche/lsd/issues/254#issuecomment-517011212

But there is a very easy way out, which solves all aspect of the problem:

do not (ever) add thousands separator when the --size=bytes option is used
only add thousands separator when a new suboption is used, ie the --size=bytes_with_thousands_separator option is used for example

I hope it clears :)

meain commented 3 years ago

Just wondering what a good option name would be? 🤔 bytes_with_thousands_separator is a bit too long. Or maybe even a separate option like --num-separators which someone can set to on,off,auto and auto will disable if we detect a pipe?

peter-joo commented 3 years ago

It is perfectly up to you and up to the project owners, other contributors, etc. how to do it.

For me even the --size=fancy_bytes works :)

zwpaper commented 3 years ago

I would vote for a separated flag --num-separators, as we could apply the separator to B, MB, GB, and even UNIX timestamp may be an option to be applied.

meain commented 3 years ago

Not sure if it will be useful in MB/GB etc as that will break off to next unit at around thousand. As for UNIX timestamp, I don't think comma in a timestamp looks natural. Nobody really reads a timestamp.

zwpaper commented 3 years ago

Oh, my bad, I did not notice that there is no MB or GB option for size.

also, it makes me a little bit awkward leaving me the only one reading timestamp😅.

but as the --num-separators option would only affect the byte-size, it seems that an opinion for --size might be reasonable.

merkrafter commented 3 years ago

Localization might have to be considered here as well, as some countries use dots for separating thousands. Not sure if that's a real problem though.

arkadiuszbielewicz commented 3 years ago

Hi, I was thinking about this issue and I've two questions:

System specific localization - there is num_format library which could provide us with system specific formatting, unfortunately for Windows it requires Clang. Is that a problem? Could Windows build be adjusted to deal with that?
Flags discussion - personally I'm more into adding option for --size flag, with name bytes_with_separators, are there any objections?

meain commented 3 years ago

The solution you bring up actually sound pretty good. Also the word thousands does not make sense anyway. I forgot that in my country we actually separate by hundreds after the first set 😂. bytes-with-separator seems to be good flag.

That said, I am not a big fan of adding clang as a dependency and that too just for Windows. None of the maintainers as far as I know use Windows and adding more brittleness to that platform is probably gonna make things worse.

lsd-rs / lsd

thousands separator for the --size=bytes option would be very useful #533

Expected behavior:

Actual behavior