landley / toybox

toybox
http://landley.net/toybox
BSD Zero Clause License
2.39k stars 335 forks source link

(not posix, but gnu) split -n option #287

Open aheirman opened 3 years ago

aheirman commented 3 years ago

The split -n option is sometimes useful but sadly not present in the posix specification(https://pubs.opengroup.org/onlinepubs/009696699/utilities/split.html), but is is in the gnu version https://man7.org/linux/man-pages/man1/split.1.html .

I don't know of any scripts that use it, but it's handy.

landley commented 3 years ago

Round robin distribution?

aheirman commented 3 years ago

No, it does the following:

Legion% xxd tmp 
00000000: 6865 6c6c 6f0a                           hello.
Legion% split -n 2 tmp
Legion% ls
tmp  xaa  xab
Legion% xxd xaa
00000000: 6865 6c                                  hel
Legion% xxd xab
00000000: 6c6f 0a                                  lo.

For an uneven number of bytes it does the following:

Legion% xxd tmp       
00000000: 6865 6c6c 6f6f 0a                        helloo.
Legion% split -n 2 tmp
Legion% xxd xaa       
00000000: 6865 6c                                  hel
Legion% xxd xab       
00000000: 6c6f 6f0a                                loo.
landley commented 3 years ago

The man page says " CHUNKS may be:" and then provides 6 options, one of which is "r/N like 'l' but use round robin distribution".

The easy thing to do is just -n # taking an integer, but I could presumably do the k/n one to get just one part? I'm not a user, you can't point to an existing user, dunno what subset is useful. Busybox didn't do this... I guess freebsd does -n integer: https://www.freebsd.org/cgi/man.cgi?query=split&sektion=1

aheirman commented 3 years ago

My apologies, I've never used the round robin distribution feature, and I can't imagine a (sensible) use for it. Matching the freebsd variant seems sensible to me since I haven't needed the additional features the gnu version provides.

terefang commented 2 years ago

+1 for the freebsd behavior

landley commented 1 year ago

Commit df7bfd2e1e79 went in a while back, and a test for it in commit 3fbacb1f5c5e. This just implements the simple "split into N many equal-ish sized files" stuff, without the gnu "chunks" stuff.

Is there more left to do here?