landley / toybox

toybox
http://landley.net/toybox
BSD Zero Clause License
2.44k stars 340 forks source link

find -size <number-without-suffix> not POSIX #499

Open stephane-chazelas opened 7 months ago

stephane-chazelas commented 7 months ago

Per POSIX

find . -size n

is meant to return the files whose size rounded up to an integer number of 512-byte units is 1.

For instance, find . -size 1 is meant to report the files whose size ranges from 1 to 512 bytes (the ones that would typically occupy one sector of disk space in the olden days).

But for toybox (and busybox, which shares the same non-conformance), it only reports files whose size is exactly 512.

There are similar problems for find . -size +n and find . -size -n.

Like for the test utility (https://github.com/landley/toybox/issues/498), there's also the separate problem that find . -size 010c finds the files of size 8 instead of 10.

Note the behaviour when using suffixes other than c is fine as out of the POSIX scope and is aligned with most other implementations that support those or some of those suffixes (except GNU find).

See https://unix.stackexchange.com/questions/774817/what-are-the-file-size-options-for-find-size-command/774840#774840 for more of the gory details including comparison with other implementations.

stephane-chazelas commented 7 months ago

There's a similar problem with the -Xtime [-+]<n> predicates, there the <n> is not treated as an integer number of days and more like the -Xtime [-+]<n>d of FreeBSD.

For instance -mtime 0 is meant to reports files last modified in the last 24 hours, while toybox find only reports the ones last modified exactly now.

landley commented 7 months ago

Busybox having behaved this way 2007 and nobody noticed seem that strong an argument. Do you have a use case that broke because of this?

512 seems irrelevant (minimum block size of ext2 was 1024 back in the 1990s, even fat32 defaults to at least 4k these days). If we gave m units presumably it should round to the megabyte?

landley commented 7 months ago

Sharp edge here is that -size has any supplied units override the default (including c=bytes), but -time and -min don't (1kd days is 1000 days).

landley commented 7 months ago

Debian's find -size also implicitly selects -type f.

stephane-chazelas commented 7 months ago

Debian's find -size also implicitly selects -type f.

Why would it do that?

$ find . -size 5542c -prune -ls
      258      0 drwxr-xr-x   1 chazelas chazelas     5542 Apr 25 19:29 .
$ find /etc/mtab -size 19c -prune -ls
   187446      4 lrwxrwxrwx   1 root     root           19 Jun 27  2021 /etc/mtab -> ../proc/self/mounts
$ find --version
find (GNU findutils) 4.9.0
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Eric B. Decker, James Youngman, and Kevin Dalley.
Features enabled: D_TYPE O_NOFOLLOW(enabled) LEAF_OPTIMISATION FTS(FTS_CWDFD) CBO(level=2)
$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux trixie/sid
Release:        n/a
Codename:       trixie
stephane-chazelas commented 7 months ago

Busybox having behaved this way 2007 and nobody noticed seem that strong an argument. Do you have a use case that broke because of this?

512 seems irrelevant (minimum block size of ext2 was 1024 back in the 1990s, even fat32 defaults to at least 4k these days). If we gave m units presumably it should round to the megabyte?

It's quite well known busybox is not standard compliant and that one needs to adapt their script when porting to busybox.

Common denominator for block device block size is still 512 bytes.

But that's hardly relevant (and not the point of this question).

Find's -size <number-without-suffix> is a well known almost 50 years old API which checks the size based on number of 512 byte units. If you want your tool to use different unit, don't call it find or use a separate API that doesn't break backward compatibility like the find -size 12k of FreeBSD or GNU (incompatible between themselves), or introduce a new one and convince other implementations to adopt it so it can be suggested as a standard to POSIX and used portably in a few decades.

terefang commented 6 months ago

But that's hardly relevant (and not the point of this question).

Find's -size <number-without-suffix> is a well known almost 50 years old API which checks the size based on number of 512 byte units. If you want your tool to use different unit, don't call it find or use a separate API that doesn't break backward compatibility like the find -size 12k of FreeBSD or GNU (incompatible between themselves), or introduce a new one and convince other implementations to adopt it so it can be suggested as a standard to POSIX and used portably in a few decades.

@stephane-chazelas while you have a point that there is deviation from the POSIX standard here, toybox is only claiming reasonably standards-compliant and possible more inclined to follow busybox compatiblity here.

also the POSIX standard is imprecise, vague or outright lacking in many places, having been the playground of many corporate interests in the past decades.

if you have an interesting solution to that problem, you can always submit a patch or pull request for review.