openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.55k stars 1.74k forks source link

Fix the ambiguous units in OpenZFS #11046

Open LeoEurope opened 4 years ago

LeoEurope commented 4 years ago

The issue I just saw a bug report that contained the output of a ‘zpool status’ command and realised that the units are actually oddly ambiguous for a filesystem. "2.84T scanned at 586M/s, 852G issued at 172M/s, 9.52T total”.

SI defines prefixes such as the T as tera (1012), the G as giga (109) and the M as mega (106) but that should be followed by the actual unit. That unit is missing here.

It’s obvious that OpenZFS is not reporting Megawatts or Terahertz. It is, however, not clear whether it actually means TB or TiB, GB or GiB and MB or MiB. Is “172M/s” 172 Megabyte or Megabit per second? If I have a file that is reported to be 0.98GiB in Nautilus, will that fit on a dataset that ‘zfs list’ reports as having “1G” available? And a file that mac OS Finder reports as being 1GB?

What should the actual unit be here, MB (10002) and equivalents or MiB (10242) and equivalents? With the size of modern day data sets the 7% difference between the two is getting quite big in actual sizes and it’s important for a file system to report this correctly. People can currently come across unclear mismatches between GUIs and the CLI or between various operating systems.

The solution Microsoft has made quite a mess with this on Windows by pairing 10242 values with 10002 units, showing everyone how not to do it (https://www.nextofwindows.com/gb-gigabyte-and-mb-megabyte-is-not-what-you-think-is-in-windows). Fortunately macOS solved the issue in 2007 (https://support.apple.com/en-us/HT201402), Debian in 2009 (https://wiki.debian.org/ConsistentUnitPrefixes), Ubuntu in 2010 (https://wiki.ubuntu.com/UnitsPolicy). Linux now has a man page dedicated to units (https://man7.org/linux/man-pages/man7/units.7.html)

Some higher level applications such as Nautilus, XigmaNAS, TrueNAS etc. have dealt with this by opting for IEC units and values, Finder in macOS has opted for SI units and values. Personally I don’t care if the displayed unit is MB or MiB as long as it’s the correct unit to go with the actual value.

Is it time to make this less ambiguous and replace T, G and M for the correct units throughout the OpenZFS code? Perhaps as a goal for v3.0?

PrivatePuffin commented 4 years ago

This is one of the better request this month! Yes, this is an actual issue everywhere on the internet, including OpenZFS and we should be more carefull both in the CLI, the docs AND the code itself.

LeoEurope commented 4 years ago

Thanks!

I think the ‘easiest’ route to a fix will be to see what the code actually does and use that as the ’source of truth’. Then make sure the output uses units that correctly reflect the values that the code produces and lastly change the docs to correctly reflect the code and the output.

I’m not a coder so please take the following analysis with some caution. I assume the code actually mainly deals in bytes and not groupings of bytes until it needs to report something to the user.

When I look in cmd/zpool/zpool_main.c I see the following code responsible for the progress output I referred to earlier:

if (pause == 0) {
        (void) printf(gettext("\t%s scanned at %s/s, "
            "%s issued at %s/s, %s total\n"),
            scanned_buf, srate_buf, issued_buf, irate_buf, total_buf);
    } else {
        (void) printf(gettext("\t%s scanned, %s issued, %s total\n"),
            scanned_buf, issued_buf, total_buf);
    }

Those *_buf values are ultimately formatted in zfs/lib/libzutil/zutil_nicenum.c:

{
    uint64_t n = num;
    int index = 0;
    const char *u;
    const char *units[3][7] = {
        [ZFS_NICENUM_1024] = {"", "K", "M", "G", "T", "P", "E"},
        [ZFS_NICENUM_BYTES] = {"B", "K", "M", "G", "T", "P", "E"},
        [ZFS_NICENUM_TIME] = {"ns", "us", "ms", "s", "?", "?", "?"}
    };

So that is where the (lack of) unit is set.

That also suggests to me that OpenZFS indeed uses Base2 and not Base10 for the values it displays. In that case the correct output units would be KiB, MiB, GiB, TiB, PiB and EiB.

What to do? To fix the ambiguity OpenZFS would have to opt for adding either B or iB units to the prefix it currently displays.

The benefit of using *B is that it’s in line with how HDDs and SSDs are sold and what most novice users are used to. It would, however, require that the 1024 based code should be changed to 1000. That is a lot more involved, and risky, than changing the unit.

The benefit of using iB is that it signals that this has been thought through. When you see a B you never know whether it’s legacy (Microsoft) or fixed (Apple). Also, at a very low level a base2 unit such as iB falls nicely along disk sector lines whereas B does not. Lastly, using *iB would put OpenZFS in line with how Ubuntu, Debian, XigmaNAS, TrueNAS and many open source tools report it.

richardelling commented 4 years ago

NB for any future rollout of this, many people have screen scrapers that parse the output of zpool status Any fix would need to be well communicated to the entire ZFS community. Also, there needs to be an easy way for an administrator to get the previous behaviour in case this change breaks their scrapers.

LeoEurope commented 3 years ago

NB for any future rollout of this, many people have screen scrapers that parse the output of zpool status Any fix would need to be well communicated to the entire ZFS community.

I agree, that's why I think the change should be implemented soon but not go in production earlier than OpenZFS 3.0 so it has a year or so to bake and for people to discover if and how it impacts them.

I can imagine quite a few scrapers just parse until they reach a "T", "G" or "M" so changing those to "TB" or TiB" etc. may not make any difference. For some other scrapers it may well make a difference.

There may also be issues with column spacing in ZFS list and others if we add one or two characters after the prefix.

LeoEurope commented 3 years ago

I have three questions at this stage:

1) What would happen if lib/libzutil/zutil_nicenum.c were to be changed as follows?

[ZFS_NICENUM_1000] = {"", "KB”, "MB”, "GB”, "TB”, "PB”, "EB”},
[ZFS_NICENUM_1024] = {"", "KiB”, "MiB”, "GiB”, "TiB”, "PiB”, "EiB”},

Would the addition of a ‘1000’ line just mean that, at a later stage elsewhere in the code, someone could use a base10 value and SI unit if they thought it was more appropriate in the specific use case they are working on?

Potential use cases for a ZFS_NICENUM_1000 would be code that needs to reflect the size that HDDs and SSDs are advertised as, or code that needs to interact with macOS where Finder uses base10 and SI units.

2) What is this line in lib/libzutil/zutil_nicenum.c for?

[ZFS_NICENUM_BYTES] = {"B", "K", "M", "G", "T", "P", "E"},

In libzutil.h it says:

 * ZFS_NICENUM_BYTES:   Print single bytes ("13B"), kilo, mega, tera...

However, either you print single bytes ("13B") or you print groupings of bytes ("13MB", 13 MiB"). Printing "single" bytes as mega does not make sense. Or am I missing something?

3) What does this code in zfs/include/libzutil.h do?

/*
 * Formats for iostat numbers.  Examples: "12K", "30ms", "4B", "2321234", "-".
 *
 * ZFS_NICENUM_1024:    Print kilo, mega, tera, peta, exa..
 * ZFS_NICENUM_BYTES:   Print single bytes ("13B"), kilo, mega, tera...
 * ZFS_NICENUM_TIME:    Print nanosecs, microsecs, millisecs, seconds...
 * ZFS_NICENUM_RAW: Print the raw number without any formatting
 * ZFS_NICENUM_RAWTIME: Same as RAW, but print dashes ('-') for zero.
 */
enum zfs_nicenum_format {
    ZFS_NICENUM_1024 = 0,
    ZFS_NICENUM_BYTES = 1,
    ZFS_NICENUM_TIME = 2,
    ZFS_NICENUM_RAW = 3,
    ZFS_NICENUM_RAWTIME = 4
};
shodanshok commented 3 years ago

I agree with @richardelling For this very reason, I would consider sufficient to simply clearly document it, without touching the reported values at all. And I really like ZFS to continue using Base2 units (as basically all others linux block device tools and filesystems) rather than Base10 units.

saivert commented 3 years ago

Having a --si option would be beneficial. The df command has this and then it is easy to verify you are dealing with the right capacity as per harddrive manufacturers.

Until then the numfmtprogram from coreutils is handy:

zfs list -p | numfmt --to=si --suffix=B --field=2-4 --format=%.2f --round=nearest --header=1

stale[bot] commented 2 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

zacharyfreeman70 commented 2 years ago

I too would love to see this ambiguity resolved. I just got hit with this issue last week. We've had well-defined units for Powers of 10 (TB) and Powers of 2 (TiB) for decades now. ZFS ought to use them correctly.

Personally, I would prefer we stick with Powers of 2 and add an --si flag just as @saivert suggests as that would be consistent with dozens of well-known and widely-used CLI tools.

@richardelling makes a good point about breaking screen scrapers and janky home-made scripts, but I suspect reserving this change for a major release - say the 3.0 branch - and coupling it with some clear and consistent communication at the top of the release notes would go a long way towards minimizing impact.

LeoEurope commented 2 years ago

The quickest first step, that could take away much of the ambiguity, would be to update the documentation.

Leave the code to use Base2 and to not mention any unit explicitly (that is the status quo) but go through the documentation and fix all the units mentioned there to use IEC binary prefixes.

Anyone who needs to know the exact value of "2.84T" or "172M/s" can look up in the documentation to see what is meant by those statements. It also makes this much clearer for future developers contributing to OpenZFS that this is how OpenZFS deals with Base2 and units.

stale[bot] commented 1 year ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

zacharyfreeman70 commented 1 year ago

One of my more junior colleagues just ran into this unit problem again while migrating data off of a misbehaving zpool.

He's a non-native English speaker, but he has a perfect understanding of the differences between SI units (MB, GB, TB) and IEC units (MiB, GiB, TiB). He also knows that disk manufacturers use SI units instead of IEC units to report their sizes.

While preparing for a migration, he saw that ZFS reported sizes in "T" (e.g., "98T") and logically assumed ZFS was using SI units. (Otherwise, he reasoned, if ZFS was using IEC units, it'd report values as "98Ti".) As a result, he ordered 100TB worth of new storage to accomplish this migration, thinking this would leave him with 2TB to spare.

Unfortunately, as everyone on this thread can attest, ZFS actually uses IEC units in its output, so "98T" is actually "98TiB", which is "107.75TB" when converted to proper SI units. As a result, the storage he ordered will be 7.75 TB short and he'll get to spend the next few days cancelling and resubmitting POs.

Although one could argue users should just RTFM, that only works when 1) users are confused enough in the first place to seek out the documentation and 2) there's adequate documentation available in their native language.

As a result, I'd love to see this issue revisited, ideally ahead of the 3.0 branch. I agree with the points @richardelling and @LeoEurope have made above and feel that while this might cause some small portion of screen-scraping scripts to temporarily break, it would be well-worth the hassle if it meant ZFS started clearly and unambiguously reporting units in MiB, GiB, and TiB, etc.

If users truly want SI units, an --si argument that displays MB, GB, and TB, as @saivert suggested, seems like a perfectly reasonable compromise.

But under no condition should ZFS continue to ambiguously report in units like "M", "G", and "T" without any suffixes.

tonyhutter commented 4 weeks ago

As a side note - the master branch has JSON output support for most of the ZFS commands (https://github.com/openzfs/zfs/pull/16217). We just need JSON support for zpool iostat and we'll have nearly everything covered. That will go a long way towards retiring all the screen scraping scripts that everyone uses.