lsd-rs / lsd

The next gen ls command
Apache License 2.0
13.44k stars 436 forks source link

thousands separator for the --size=bytes option would be very useful #533

Open peter-joo opened 3 years ago

peter-joo commented 3 years ago

Expected behavior:

It is very hard to quickly interpret/recognize the real file/directory sizes when the --size=bytes option is given:

>lsd --size=bytes --sort=size --reverse`
.rw-r--r-- p p          1  Mon Jun 28 19:12:04 2021  file_1.dat
.rw-r--r-- p p         12  Mon Jun 28 19:12:04 2021  file_2.dat
.rw-r--r-- p p        123  Mon Jun 28 19:12:04 2021  file_3.dat
.rw-r--r-- p p       1234  Mon Jun 28 19:12:04 2021  file_4.dat
.rw-r--r-- p p      12345  Mon Jun 28 19:12:04 2021  file_5.dat
.rw-r--r-- p p     123456  Mon Jun 28 19:12:04 2021  file_6.dat
.rw-r--r-- p p    1234567  Mon Jun 28 19:12:04 2021  file_7.dat
.rw-r--r-- p p   12345678  Mon Jun 28 19:12:04 2021  file_8.dat
.rw-r--r-- p p  123456789  Mon Jun 28 19:12:04 2021  file_9.dat
.rw-r--r-- p p 1234567890  Mon Jun 28 19:12:06 2021  file_10.dat

However the other/similar tool called exa ( https://github.com/ogham/exa ) includes the thousands separator by default:

>exa --bytes --long --sort=size`
.rw-r--r--             1 p 28 Jun 19:12 file_1.dat
.rw-r--r--            12 p 28 Jun 19:12 file_2.dat
.rw-r--r--           123 p 28 Jun 19:12 file_3.dat
.rw-r--r--         1,234 p 28 Jun 19:12 file_4.dat
.rw-r--r--        12,345 p 28 Jun 19:12 file_5.dat
.rw-r--r--       123,456 p 28 Jun 19:12 file_6.dat
.rw-r--r--     1,234,567 p 28 Jun 19:12 file_7.dat
.rw-r--r--    12,345,678 p 28 Jun 19:12 file_8.dat
.rw-r--r--   123,456,789 p 28 Jun 19:12 file_9.dat
.rw-r--r-- 1,234,567,890 p 28 Jun 19:12 file_10.dat

Actual behavior

Extra cognitive load without those thousands separators :(

meain commented 3 years ago

This might not be a good idea. This will cause issues for people who might be using lsd in a script and grepping for the size part. I don't think breaking compatibility with gnu ls here would be a good idea.

peter-joo commented 3 years ago

Well, I really wanted to describe what to achieve, not how to achieve.

Also I agree, a previous ticket was by someone who used awk to parse lsd's output and due to space (or other separators) the parsing has failed: https://github.com/Peltoche/lsd/issues/254#issuecomment-517011212

But there is a very easy way out, which solves all aspect of the problem:

I hope it clears :)

meain commented 3 years ago

Just wondering what a good option name would be? 🤔 bytes_with_thousands_separator is a bit too long. Or maybe even a separate option like --num-separators which someone can set to on,off,auto and auto will disable if we detect a pipe?

peter-joo commented 3 years ago

It is perfectly up to you and up to the project owners, other contributors, etc. how to do it.

For me even the --size=fancy_bytes works :)

zwpaper commented 3 years ago

I would vote for a separated flag --num-separators, as we could apply the separator to B, MB, GB, and even UNIX timestamp may be an option to be applied.

meain commented 3 years ago

Not sure if it will be useful in MB/GB etc as that will break off to next unit at around thousand. As for UNIX timestamp, I don't think comma in a timestamp looks natural. Nobody really reads a timestamp.

zwpaper commented 3 years ago

Oh, my bad, I did not notice that there is no MB or GB option for size.

also, it makes me a little bit awkward leaving me the only one reading timestamp😅.

but as the --num-separators option would only affect the byte-size, it seems that an opinion for --size might be reasonable.

merkrafter commented 3 years ago

Localization might have to be considered here as well, as some countries use dots for separating thousands. Not sure if that's a real problem though.

arkadiuszbielewicz commented 3 years ago

Hi, I was thinking about this issue and I've two questions:

  1. System specific localization - there is num_format library which could provide us with system specific formatting, unfortunately for Windows it requires Clang. Is that a problem? Could Windows build be adjusted to deal with that?
  2. Flags discussion - personally I'm more into adding option for --size flag, with name bytes_with_separators, are there any objections?
meain commented 3 years ago

The solution you bring up actually sound pretty good. Also the word thousands does not make sense anyway. I forgot that in my country we actually separate by hundreds after the first set 😂. bytes-with-separator seems to be good flag.

That said, I am not a big fan of adding clang as a dependency and that too just for Windows. None of the maintainers as far as I know use Windows and adding more brittleness to that platform is probably gonna make things worse.