Closed metadaddy closed 1 month ago
I noticed this when working on https://github.com/trinodb/trino/pull/11160 and also reported https://github.com/airlift/units/issues/26. CLI uses the https://github.com/trinodb/trino/blob/fe608f2723842037ff620d612a706900e79c52c8/testing/trino-benchmark/src/main/java/io/trino/benchmark/FormatUtils.java#L75
So this issue is about different parts of the code, but the same problem. I believe the response in the Airlift issue would also apply here.
@nineinchnick I'm preparing a PR specifically in the CLI - since the output is reported to the user, rather than written to logs, so the blast radius should be smaller.
@dain wdyt?
If the implementation actually lives in the CLI, we can fix it there. If it however completely lives in airlift I think we should not add some shim in the CLI, but instead fix it in airlift. In either case we can help with reviews and input though @metadaddy
Also unrelated question .. if you are using Trino .. would you be interested to be added to https://trino.io/users.html ?
Thanks, @mosabua - I'm submitting a PR. As far as using Trino - stay tuned...
Seems like there's no desire to fix this: https://github.com/trinodb/trino/pull/13095#issuecomment-1175692277
These values in the CLI are only meant to be interpreted by a human, so I think it's reasonable to change the behavior if it will make it more intuitive. It would be a different discussion if we were talking about changes to values exposed via APIs or persisted in some way that needs to preserve backward compatibility.
Using the Trino CLI to query CSV data via the Hive connector, I noticed that the CLI is reporting the amount of data transferred incorrectly. Querying a 72.8 MB CSV file, the CLI reported that it had transferred only 69.4 MB of data.
It seems that the discrepancy arises from the fact that macOS, correctly, reports file sizes in kilobytes, megabytes, gigabytes, etc, based on powers of 1,000, with the abbreviations KB, MB, GB, while the Trino CLI is reporting the number of kibibytes, mebibytes, gibibytes etc, based on powers of 1,024. but using the KB/MB/GB abbreviations rather than the correct KiB/MiB/GiB. The math confirms this - (69.4 1,024 1,024) / (1,000 * 1,000) = 72.8.
See this wikipedia page for an explanation of the correct terms to use.
The Trino CLI should either use the KiB/MiB/GiB abbreviations or, preferably, report the number of kilobytes/megabytes/gigabytes correctly.