Closed yoshuawuyts closed 1 year ago
Having those as const fn is going to be useful to initialize constants, etc.
Another idea for the naming bikeshed: log2_floor()
etc. or some variation. Prevents mistakes w.r.t. whether it computes floor(log(n)) or ceil(log(n)) or something else. I am aware of the reasons why it's floor, but it was not immediately obvious to me when I first saw their signature, and I am most likely not alone.
@hanna-kruppe This was brought up before, and was addressed in https://github.com/rust-lang/rust/pull/70835#issuecomment-609795147:
(...) the proposal returns the floor (...)
That's an interesting statement. I was under the impression that integer operations always floor unless specified otherwise. For example integer division works the same way:
// 7/3 = ~2.33 = 2 assert_eq!(7u8 / 3u8, 2u8);
The ecosystem contains extensions for e.g.
integer::div_ceil
which would round the value up. However these are not part ofstd
, and if they were introduced they would likely use a_ceil
suffix as well.
Does this make sense, or do you feel there is still missing something?
I am aware of that exchange, it gives one of the reasons for preferring floor semantics that I was referring to. I don't disagree with that being the chosen behavior, I just think making it explicit in the name would be helpful sometimes and not cause any harm otherwise. While it's true that all the methods in std
that have to pick one prefer floor over ceil, I believe all of those methods are (variants of) integer division, so it might not be (or at least be perceived as) general convention for integer-valued computations but something specific to integer division.
Regarding naming: I'm in favour of ilog2
because it's also an operation that would make sense to provide on floating-point numbers. Precedent: https://graphics.stanford.edu/~seander/bithacks.html#IntegerLogFloat
EDIT: Also C99 https://en.cppreference.com/w/c/numeric/math and C++11 https://en.cppreference.com/w/cpp/numeric/math apparently also have ilogb
and logb
.
Regarding naming: I'm fine with ilog2
or the much more explicit floor_log2
/ceil_log2
like is used in the algebraics
library I wrote.
@scottmcm On the other hand, floating-point numbers currently have powi
and powf
, while integers have pow
not powi
. So it would be consistent with that to have integer functions called log
and floating-point functions called something else. (I'm just mentioning this for completeness, I haven't got a preference either way yet.)
There are no other functions like floor_func
, and I think adding floor_log2
would stand out.
I think ilog2
would be a better name.
Regarding naming: I'm in favour of
ilog2
because it's also an operation that would make sense to provide on floating-point numbers.
@scottmcm This was brought up before in the PR. Reiterating my reasoning here:
I think Integer::{log,log2,log10} have the right names since all other methods on integers aren't prefixed with i either. For example Integer::div isn't called Integer::idiv yet still performs "integer division" which yields different results than if the numbers were cast to floats and then divided.
I think @tspiteri correctly points out that there doesn't appear to be a precedent for float-based operations on integers in the stdlib. But there is a precedent for integer-based operations on floats, suffixed with i
for integer and f
for float (e.g. f32::{powi, powf}
). Unless we would expect to expose float-based counterparts to the log methods on integers in the future it seems to me that Integer::{log,log2,log10}
are the most compatible with the existing naming scheme since no other method on integers currently uses a suffix (e.g. u32::pow
).
However if we think people might wonder what kind of log operation this is, we could perhaps spell out in the documentation that this is indeed "integer logarithm".
@yoshuawuyts Are there any other functions for integers and floats with the same name, but return different results?
@EdorianDark yes, as I mentioned in an earlier comment multiplication (mul
) and division (div
) work much the same way:
assert_eq!((5.0 / 2.0) * 2.0, 5.0); // {mul, div} for floats
assert_eq!((5 / 2) * 2, 4); // {mul, div} for ints
@yoshuawuyts Yes am aware that the operators behave differently. But as far as I know there are up till now no functions with the same name and a completely different result.
@EdorianDark The same statement I shared above could be written as:
use std::ops::{Div, Mul};
assert_eq!(5_u32.div(2).mul(2), 4); // {mul, div} for ints
assert_eq!(5_f32.div(2.0).mul(2.0), 5.0); // {mul, div} for floats
We can observe the same methods being called with the same inputs yielding different results because one operates on floats, and the other on ints. Integers and floats have inherently different behavior which means there are many other examples possible -- but the div/mul example is a simple way to convey this.
The overarching point I'm trying to make with this example is that Integer::{log,log2,log10}
behave consistently with Integer::mul
, Integer::div
and other operations. Unless there are extenuating factors such as anticipating we also would like to expose float-based logarithm operations on integers (for which there is no precedent), these seem like the most obvious names.
What about voting for the name in internals.rust-lang.org/ ?
The functions mul
and div
are the functions implementing the operators, so they have to have the same name because of the design of the operators.
Maybe lets looks at what other languages do for log
on an integer:
Is there any example to call the integer logarithm log
?
The functions
mul
anddiv
are the functions implementing the operators, so they have to have the same name because of the design of the operators.
@EdorianDark I'm glad we can agree these methods have the same names but behave differently. Similar examples could be written for non-trait based methods such as u32::div_euclid
vs f32::div_euclid
.
Maybe lets looks at what other languages do for log on an integer:
@EdorianDark I think this is something worth exploring (and we already have, but to a lesser extent), but in this context it feels like a moving of the goal posts. But sure, let's take a look:
std::log(double arg)
log(double arg)
(imported from <math.h>
)math.log(x)
Math.log(x)
Math.log(double a)
Math.Log(double d);
These methods are different from Rust in several ways:
In contrast Rust has the benefit of clear module hierarchies, operations scoped to their number types, and no implicit casting. The examples would be more relevant if they exposed int logarithms or were methods on numbers. But so far what we're discussing in this thread seems quite different.
So far the only plausible reason for naming these methods anything other Integer::{log, log2, log10}
I've seen, would be if we also expect to expose float-based log on ints. In that case the naming ought to be Integer::{logi, logf, logi2, logf2, logi10, logf10}
to match the precedent set by f32::{powi, powf}
. But that would introduce a lot of methods, and there should be a clear motivation for why we would want to expose both variants.
[1]: C++ shows IntegralType arg
in the docs with the purpose to show that if integers are passed they will be cast to floats first.
For operators like *
or /
and with them the functions mul
and div
the expectation is, that they operate within the the types of their arguments.
The logarithm is a continuous function defined on all positive real numbers. The examples I listed show, that the expectation for log
is to behave like the logarithm, mapping into floating point numbers. How the functions get to there is an implementation detail.
In Rust there is now the possibility to let log
into integers, but why would someone expect that?
Is there any prior art for implementing log
with as a inter logarithm and not as logarithm?
I'd argue for not using just log
, log2
, or log10
for integer logarithm because, even if we never intend to add int -> float logarithms, quite a few other common programming languages (C, C++, Python, Java, JavaScript, etc.) use just plain log[2|10]
to mean the float form and quite a lot of people will assume that Rust's variants are the same float functions without checking the docs because they share the same names, leading to confusion.
Integer division doesn't really have the same naming issue as integer logarithm, since there are quite a few other programming languages where integer division uses the same operator as float division (C, C++, Java, etc.) so is much less of a learning obstacle.
The original PR was never merged, which means that this isn't tracking anything at the moment. Closing this for now.
The original PR was approved by the libs team, but when attempting to merge failed on bors. I've re-filed the PR in https://github.com/rust-lang/rust/pull/80918 which include a fix which should make the failing tests pass.
Apologies for not doing this sooner; going from src/libcore
-> library/core
midway through trying to fix the PR left me feeling overwhelmed and not sure how to proceed. I'm feeling a bit more confident poking at the stdlib now, hence the attempt to retry the PR (:
@m-ou-se Should this be reopened now that #80918 has been merged?
@tspiteri Yes. Thanks!
u32
, and while the strategy of the comment was indeed faster than the non-special-cased original, the actually committed PR was faster still:test u32_comment ... bench: 1,377 ns/iter (+/- 51)
test u32_86930 ... bench: 734 ns/iter (+/- 8)
test u32_original ... bench: 1,725 ns/iter (+/- 53)
This should complete the task about optimizing log10.
With regard to base 10, I've found that using the approach followed here is approximately 10% faster on u32
using a crude benchmark. Presumably this holds for larger integers, but the current approach is faster on u8
and u16
.
pub const fn log10_u32_proposed(x: u32) -> u32 {
const TABLE: &[u64] = &[
0x0000_0000_0000,
0x0000_0000_0000,
0x0000_0000_0000,
0x0000_FFFF_FFF6,
0x0001_0000_0000,
0x0001_0000_0000,
0x0001_FFFF_FF9C,
0x0002_0000_0000,
0x0002_0000_0000,
0x0002_FFFF_FC18,
0x0003_0000_0000,
0x0003_0000_0000,
0x0003_0000_0000,
0x0003_FFFF_D8F0,
0x0004_0000_0000,
0x0004_0000_0000,
0x0004_FFFE_7960,
0x0005_0000_0000,
0x0005_0000_0000,
0x0005_FFF0_BDC0,
0x0006_0000_0000,
0x0006_0000_0000,
0x0006_0000_0000,
0x0006_FF67_6980,
0x0007_0000_0000,
0x0007_0000_0000,
0x0007_FA0A_1F00,
0x0008_0000_0000,
0x0008_0000_0000,
0x0008_C465_3600,
0x0009_0000_0000,
0x0009_0000_0000,
];
((x as u64 + TABLE[31 - x.leading_zeros() as usize]) >> 32) as _
}
Happy to provide similar lookup tables for other integer types if desired.
I always find it hard to judge table lookups. In microbenchmarks they often look great since cache pressure isn't a concern.
Of course cache is huge for something like that. Just throwing it out there; it's plenty fast either way.
Currently, the return type for T::{log,log2,log10}
is T
(https://doc.rust-lang.org/nightly/std/primitive.u128.html#method.log2). Since the return value is at most 128, this is inefficient for at least u128
. Methods such as T::{leading_zeros,count_ones}
always return u32
. I would suggest to also always return u32
here, making the API also more consistent.
I've just seen that in my code u32::log10 isn't inlined (in optimized builds), I don't know if this is OK:
mov r15, qword ptr [rip + core::num::int_log10::u32@GOTPCREL]
...
call r15
@leonardo-m https://github.com/rust-lang/rust/pull/89817
Naming feedback: in https://github.com/rust-lang/rust/pull/92747#discussion_r781648604 someone was surprised that log2
was just the integer part. That could potentially have been avoided had the method been ilog2
instead (or, generally, something emphasizing that this is the integer logarithm).
Random thought here.. now that we have these, is it possible to have something like Integer::log<N>
implemented?
Naming feedback: in #92747 (comment) someone was surprised that
log2
was just the integer part. That could potentially have been avoided had the method beenilog2
instead (or, generally, something emphasizing that this is the integer logarithm).
Isn't this inconsistent with other operators? We don't seem to have idiv
to indicate division is going to floor. I had the impression it always depends on the type you call it on and since rust doesn't allow any mixing of types in arithmetic, I'm forced to always be very conscious of them.
Random thought here.. now that we have these, is it possible to have something like Integer::log
implemented?
This one is not generic, but should get you the result you want: https://doc.rust-lang.org/std/primitive.u64.html#method.log
NonZeroU*
now has everything but log
and optimization can be done in a later stage.
So I guess at least for primitives, nomenclature is probably the only thing blocking stabilization.
If you're wondering why I didn't put log
on NonZeroU*
, it's because it can still fail because of a bad base argument. (And there's no AtLeastTwoU*
.) So .get().log(b)
is probably fine there.
Whereas log2
and log10
are infallible on NonZeroU*
, and thus they seemed more worth having to me.
👍 to resolving naming and getting stabilized!
Just ran into wanting this myself. Without following any prior context, as far as naming goes, I'd put my vote in for log
, log2
, without the prefix, and returning a u32
seems reasonable to me.
I think the right sequence here would be:
NonZero
types to validate that those indeed work there.As part of the stabilization report we probably should raise the point on naming, as that's the only unsettled issue, and provide an overview so the libs-api team can make a decision on it in a sync meeting. Though perhaps the naming issue question doesn't need a stabilization report to answer by the libs team?
@rust-lang/libs-api Do y'all have thoughts on the process I described ^? I'm happy to put in some of the work to help get this over the finish line.
Is there any other programming language, in which log
returns an integer?
Log is a common abbreviation of Logarithm, a continuous function which has often non integer values for integers: log(5)=0,698...
.
There are several other languages which define the result of log
of an integer as a float number.
This leads to developers expecting log
to behave differently, as can be seen here.
Therefore I would say, that the name of the function would be better ilog
.
Other operators like div
implement traits like Div to be able to use operators like /
. Therefore they cannot be named otherwise, but this is not the case for logarithm.
What speaks against naming the function ilog
?
@yoshuawuyts The NonZero types already have log10
and log2
(https://doc.rust-lang.org/nightly/std/num/struct.NonZeroU32.html#method.log10), and see https://github.com/rust-lang/rust/issues/70887#issuecomment-1091986634 for why I don't think log
makes sense on them.
So I think (1) is done, and it's just a matter of confirming the names to be able to stabilize.
Question: I am new to rust and am a mathematician. log base 2 of 0 is either not defined or NaN. log base 2 of 1 through 255 is between 0 and 7. Why are the log2, log10 and log functions return u32?
Thanks.
@mstankus Same reason that https://doc.rust-lang.org/std/primitive.u8.html#method.count_ones returns u32
, even though it can only be 0..=8
, and https://doc.rust-lang.org/std/primitive.u16.html#method.checked_shl takes u32
as an argument even though it has to be 0..16
.
That's not a very good answer, is it? 😀
One possible explanation is that 32-bit arithmetic can be faster than 16- and 8-bit: https://stackoverflow.com/a/41574531.
I would guess that the most common integer format with a fixed size is i/u32, too, so that would be the type that minimizes casting.
On Jul 17, 2022, at 12:14 PM, Laurențiu Nicola @.***> wrote:
That's not a very good answer, is it? 😀
One possible explanation is that 32-bit arithmetic can be faster than 16- and 8-bit: https://stackoverflow.com/a/41574531.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.
Just come up with this thought. Since counting the digits of the number in certain radix is one of the most common applications of logarithms, it will be more convenient if the logarithm of signed integers accepts negative values. The result will just be the logarithm of the absolute values (so log(0)
is still invalid). Does this make sense?
@cmpute I think that if someone wants log₁₀|x| for signed, it's fine for them to type x.unsigned_abs().log10()
.
unsigned_abs
can't overflow, so that'll always work just fine, and it's much clearer to the reader what's happening.
Hello libs-api folks! I'm nominating on behalf of https://github.com/rust-lang/rust/issues/70887#issuecomment-1181686322
I think these just need a decision from you whether to be called log2
or ilog2
, then they're ready for stabilization.
I don't know how you want to pick that. It probably depends if you want to leave space for integer.log2()
to return a floating-point value at some point in the future, like C++ added in 2011.
I'd here prefer ilog2
, both because of tradition and because ilog2
is a completely different operation from log2
, and it's confusing to call it like that.
We discussed this in today's @rust-lang/libs-api meeting, and there seemed to be a general consensus in favor of ilog2
.
log
methodsu32
log10
methodslog10
methodslog
methods for NonZero
typeslog
methods to ilog
// implemented for all primitive integers (i8, u8, etc.)
pub const fn ilog(self, base: Self) -> u32
pub const fn ilog2(self) -> u32
pub const fn ilog10(self) -> u32
pub const fn checked_ilog(self, base: Self) -> Option<u32>
pub const fn checked_ilog2(self) -> Option<u32>
pub const fn checked_ilog10(self) -> Option<u32>
// implemented for non-zero unsigned integers only (NonZeroU8, NonZeroU16, etc.)
pub const fn ilog2(self) -> u32
pub const fn ilog10(self) -> u32
The original motivation for implementing integer log
methods can be found in https://github.com/rust-lang/rust/pull/80918#issue-783541739:
would be nice if there was an integer log2 instead of having to either use the f32 version or leading_zeros() which i have to verify the results of every time to be sure
Integer logarithm is a classic operation that's hard to implement efficiently by hand [^hard], complex enough that it needs to be thoroughly tested to ensure correctness, and useful for a wide range of applications. I personally don't use integer logarithm, so I can't speak from my own experience. But here's a selection of uses found in the ecosystem:
[^hard]: The implementation history of this feature should serve as enough evidence that this is not something which is trivial to get right. The implementation has gone through many revisions, with the help of numerous domain experts. We now have a performant and correct implementation - something which would be difficult to replicate for many people without spending a non-trivial amount of time on it.
u64::log10
instead of a lookup table once stable.ilog10
once stablebigint
type.log2
internally to simplify the implementationlog10
for efficiency once stableOverall there seems to be a strong desire in the community to have the integer log
methods stabilized as they represent a fundamental operation which is hard to get right otherwise. And as far as I've been able to tell there's been no indication that we shouldn't be adding these methods or that the API isn't right. Therefore I'd like to request the @rust-lang/libs-api team to start an FCP on this so we can have it be made generally available.
NonZeroU32::new(7).unwrap().log2()
Do you mean:
NonZeroU32::new(7).unwrap().ilog2()
Feature gate:
#![feature(int_log)]
Stabilization report: https://github.com/rust-lang/rust/issues/70887#issuecomment-1210602692This is a tracking issue for adding
{checked_,}{ilog,ilog2,ilog10}
methods to{u8,u16,u32,u64,u128}
. A full rationale can be found in https://github.com/rust-lang/rust/pull/80918#issue-552884252. But in short: integer logarithms are fairly common, but require care to implement correctly, and effort to implement efficiently.Because we expose
log
methods for floats it can be tempting for users to use that instead, which can lead to rounding errors. In the following example we're trying to calculatebase10
for an integer. We might try and calculate thebase2
for the values, and attempt a base swap to arrive atbase10
. However because we're performing intermediate rounding we arrive at the wrong result:Public API
Steps / History
Unresolved Questions
ilog10
implementation? https://github.com/rust-lang/rust/pull/70835#issuecomment-609976593const fn
?NonZero{Int}
? https://github.com/rust-lang/rust/pull/70835#issuecomment-611336828Option
forilog2/ilog10
onNonZero{Int}
?Integer::{ilog,ilog2,ilog10}
instead? https://github.com/rust-lang/rust/pull/70835#issuecomment-611998098 https://github.com/rust-lang/rust/issues/70887#issuecomment-616580640Implementation History
log
methods, mergedlog
methods forNonZero
typeslog
methods toilog