rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.4k stars 12.73k forks source link

RFC: rename `int` and `uint` to `intptr`/`uintptr` #9940

Closed thestinger closed 10 years ago

thestinger commented 11 years ago

An arbitrarily sized integer type would be provided in std under the name Int. I think encouraging use of an arbitrarily sized integer when bounds are unknown is a much better solution than adding failure throwing overflow checks to fixed-size integers.

emberian commented 11 years ago

I think intptr and uintptr are awful names, but the best alternatives I can come up with is word and sword, which are worse.

Fixed integers of pseudo-arbitrary width are rarely useful.

Thiez commented 11 years ago

Seems to me int and uint are not pointers, so a 'ptr' suffix doesn't make a whole lot of sense. What would the type be of the ~[T].len() and [T, ..n].len()? Surely not uintptr. Perhaps introduce size_t?

I rather like the int and uint types. Why can't they coexist with Int? If they're going to get renamed to something ugly perhaps it would be best to stick with intptr_t and uintptr_t, existing Rust users are going to have to get used to the new stuff anyway, and it'll be easier to remember for those coming from C/C++.

I think the machine-word-sized int and uint are really nice to use as they are at this time. Int could be BigInt or Integer for people who really want arbitrary sized integers, but I'm thinking the vast majority of the time you don't want/need that functionality anyway.

huonw commented 11 years ago

intptr_t and uintptr_t

Why introduce a completely new & (so far) unused naming convention to the language?

thestinger commented 11 years ago

@Thiez: They aren't machine word size, they're pointer-size. On the x32 ABI they will be 32-bit, despite having 16 64-bit integer registers. If you want to use fixed-size integers correctly, you need upper bounds on the size. Fixed-size types named int/uint encourage writing buggy code because it implies they are a sane default rather than just a way to deal with sizes smaller than the address space.

Thiez commented 11 years ago

@thestinger fair point. Perhaps that should change as well? Since we're not really supposed to be messing around with pointers outside of unsafe blocks, perhaps a pointer-size type is deserving of an ugly name. That opens up the option of having int and uint be machine word sized...

thestinger commented 11 years ago

@cmr: I agree they're awful names. We should discourage using fixed-size types only when bounds are unknown. I think you only want these types in low-level code or for in-memory container sizes.

@Thiez: I don't really think word-sized is a useful property. If the upper bound is 32-bit, 32-bit integers will likely be fastest for the use case due to wasting less cache space.

Thiez commented 11 years ago

I realize my suggestion is silly anyway as one would still need a pointer-size variable for array and vector lengths, which is a nice case for int/uint (but not when they're word-sized). Ignore it :)

1fish2 commented 11 years ago

I completely agree with @thestinger

A machine-word sized integer means bugs and security holes e.g. because you ran the tests on one platform then deployed on others.

If one of the platforms has 16-bit int like PalmOS, that's too short to use without thinking carefully about it, so the prudent coding style forbids un-sized int and uint. (Actually the PalmOS 68000 ABI is emulated on a 32-bit ARM so it's not clear what's a machine word.)

Hence the strategy of using a pointer-size integer type only in low-level code that requires it, with an ugly name.

UtherII commented 11 years ago

I agree that using int and uint should be discouraged and renaming them to a less straightforward name is better. I don't know how type inference works but I think it should avoid using them by default too.

michaelwoerister commented 11 years ago

I think that's a good idea. You can't really rely on very much when using int/uint.

I'm not so fond of the names intptr/uintptr. Given that the use cases for these types would be rare, I think they could also be defined in the standard library with more verbose names like PointerSizedInt / PointerSizedUInt. Not much ambiguity there. One could also define other integer types in the same module in the vain of C's uint_fast8_t and uint_least8_t in stdint.h to tackle the "machine word" problem.

glaebhoerl commented 11 years ago

IMHO, the interesting questions are: what type should be used to index into arrays, and what should it be named? Indexing into arrays is pretty common. A pointer-sized type is needed to be able to represent any index. It should presumably be unsigned. I'm not sure if there's much reason to also have a signed version. Expanding to a BigInt on overflow doesn't make much sense here. But wrapping around on over/underflow also doesn't make very much sense, I think. If you want to catch over/underflow and fail!() or similar, you lose a lot (or all) of the performance advantage you might have had over the expand-to-BigInt version. So there's decent arguments in favor of expanding, wrapping, as well as trapping.

I think the strongest argument might be for expanding: negative or larger-than-the-address-space values don't make sense for array indexes, but the array bounds check will already catch that. Meanwhile it's versatile and generally useful for most other situations as well, not just array indexing. The downside is a performance cost relative to a type that wraps on over/underflow. (In the event of a fixed pointer-sized type, the relevant association when naming it should be that it holds any array index, not that it's pointer-sized.)

Whatever this type ends up being and named, it's the one that should be in the prelude.

If someone explicitly needs pointer-sized machine integers for unsafe hackery, those could indeed be named intptr and uintptr and buried in a submodule somewhere.

bstrie commented 11 years ago

Dumb question here, but what's the use of having a signed pointer-sized int at all? Could we get away with having only uintptr (or whatever it ends up being called)?

As for the general idea of this bug, I'm warming to it after seeing how well the removal of float has worked. Having to actually think about the size of my types has been quite illuminating.

thestinger commented 11 years ago

@bstrie: a signed one is needed for offsets/differences (POSIX has ssize_t mostly because they like returning -1 as an error code though! ISO C has ptrdiff_t though)

1fish2 commented 11 years ago

@thestinger good point. Subtracting array indexes should yield a signed value.

So to reverse the question, what's the need for an unsigned array index type? Is it feasible to allocate a byte array that takes more than half the address space?

thestinger commented 11 years ago

AFAIK the rationale for unsigned types here is to avoid the need for a dynamic check for a negative integer in every function. A bounds check only has to compare against the length, and a reserve/with_capacity function only has to check for overflow, not underflow. It just bubbles up the responsibility for handling underflow as far as possible into the caller (if it needs to check at all - it may not every subtract from an index).

nikomatsakis commented 11 years ago

cc me

I have contemplating whether int/uint carry their weight or not. Array indexing is a good example of where they can be useful, particularly around overloading -- we could make the built-in indexing operator accept arbitrary types (and in fact I think maybe they do?) but that's not so easy with an overloaded one.

pnkfelix commented 11 years ago

@glehel the issues you raise about how to handle overflow/underflow on array indices are important, but there is already a ticket that I think is a more appropriate spot for that discussion: #9469.

glaebhoerl commented 11 years ago

@pnkfelix I think the two are very closely related. (basically: if we want to use the existing int/uint as the preferred type for indexing arrays, then they should not be renamed to intptr/uintptr, but if we want to prefer a different type for that (e.g. one which checks for over/underflow), then they should be renamed.)

brendanzab commented 10 years ago

To those commenting that intptr and uintptr are horrible names, that's entirely the point. They should be ergonomically discouraged.

Having int and uint so succinct and pretty makes it easy for beginners to think they should use them as default. In fact I already did int for everything in glfw-rs - I should probably change them to i32s.

+1 for this change from me.

brson commented 10 years ago

If there's consensus that it's bad practice to use int by default (and I don't know that there is) then I agree we should change the names, and probably make default integer types that have the correct size.

brendanzab commented 10 years ago

@brson We already make folks choose between f32 and f64. It seems a little asymmetrical from a design point of view having uint and int as the default that folks should reach for without also having float.

nikomatsakis commented 10 years ago

I find this thread confusing.

glaebhoerl commented 10 years ago

The other possibility was to use a type that doesn't wrap on over/underflow, but eithers traps or extends into a bigint. Which is likely to be slow, but I don't know whether it's been tested.

1fish2 commented 10 years ago

(What are bors?)

An integer type with platform-specific overflow makes programs non-portable, that is, produce different results on different platforms. What's it good for besides maybe C interop? (Not for performance. E.g. Palm OS runs on a 32-bit ARM emulating a 16-bit 68000, so int is 16 bits. It's too short for an everyday loop index and probably slower than 32 bits.)

Intertwined issues: whether to have non-portable integer types, what to name them, and whether array indexing (and some type inferences?) uses fixed size integers with or without overflow traps or big-ints?

huonw commented 10 years ago

(@bors is the Rust integration bot; (almost) all PRs have the full test suite run on a variety of platforms, and a limited version on run on others, and only merge if everything passes; the r+'s that you may see on PRs are directives to @bors to attempt a merge & test run.)

ghost commented 10 years ago

There's also the x32 ABI where pointers are smaller than ints.

I'd remove variable-width int altogether (except ffi of course). Those who expect their code to run on 32-bit should already be thinking about overflows and use int64/bigint where appropriate, and those who know they'll only ever run on 64-bit should have no problem either way.

Are there credible use cases of pointer-sized Rust ints outside ffi?

emberian commented 10 years ago

Pointer-sized ints are required for representing pointers and indices into vectors (otherwise the vector will be artificially limited in size).

On Fri, Dec 6, 2013 at 6:54 AM, György Andrasek notifications@github.comwrote:

There's also the x32 ABI http://en.wikipedia.org/wiki/X32_ABI where pointers are smaller than ints.

I'd remove variable-width int altogether (except ffi of course). Those who expect their code to run on 32-bit should already be thinking about overflows and use int64/bigint where appropriate, and those who know they'll only ever run on 64-bit should have no problem either way.

Are there credible use cases of pointer-sized Rust ints outside ffi?

— Reply to this email directly or view it on GitHubhttps://github.com/mozilla/rust/issues/9940#issuecomment-29983276 .

emberian commented 10 years ago

@nikomatsakis The idea is that int and uint aren't very useful numeric types, and that a fast bigint would be more appropriate most of the time one is actually dealing with numbers. And when one does not want a real numeric type, they probably want one of the fixed-size types anyway.

nikomatsakis commented 10 years ago

@cmr I do not agree with the "probably want one of the fixed-size types" part of that sentence. That is not clear to me -- I think it is very common to have integers that are ultimately indices into some sort of array or tied to the size of a data structure, and for that use case it is natural to want an integer that represents "the address space of the machine".

Of course I think having a nice, performant bigint library would be great, particularly for implementing "business logic" or other use cases where a "true integer" is required. But I am not sure how common that really is.

nikomatsakis commented 10 years ago

Note though that signed integers that are the size of a pointer are not necessarily useful...

emberian commented 10 years ago

Well, I'm ignoring the case of vector indices: those are obviously common, and incredibly useful. I feel like having int/uint as their name might be a footgun. Maybe it's fine.

On Fri, Dec 6, 2013 at 1:28 PM, Niko Matsakis notifications@github.comwrote:

Note though that signed integers that are the size of a pointer are not necessarily useful...

— Reply to this email directly or view it on GitHubhttps://github.com/mozilla/rust/issues/9940#issuecomment-30017088 .

glaebhoerl commented 10 years ago

Even for array indices uint isn't so obviously the winner. It has the right size, yes - but wraparound on under/overflow still doesn't make any sense. If it's flowing in either direction, you have a bug. The 'right thing' would be something uint-sized with dynamic checks against under/overflow. (But maybe if you have the dynamic checks, then you might as well use them to expand to a bigint instead of failing? That's not any worse for catching bugs, because they'll just be caught by the array bounds check (or the prior conversion to uint), instead.) But either of these also has a strong likelihood of being unacceptably slow -- but that also heavily depends on the use case. That seems like the fundamental dilemma to me.

thestinger commented 10 years ago

Dynamic checks are too expensive for a vector type. You can measure a huge reduction in push performance for a vector between doing the minimum single dynamic overflow check and two. This is one of the reasons for Rust's vector types being so slow - including a length header results in more overflow checks. LLVM is not capable of optimizing out something like this except in rare cases.

1fish2 commented 10 years ago

It won't meet Rust's safety & security goals if array/vector index values overflow, underflow, or index out of bounds without detection.

Maybe the compiler can optimize out many such checks, e.g. in a loop that already checks the loop index bounds. Or arrange memory so that every index-out-of-bounds caused a memory protection fault?

(If CPUs were designed for real needs rather than based on stats of C programs, they'd do array bounds checks in parallel with array indexing and trap on numeric over/underflow.)

thestinger commented 10 years ago

The overhead comes from the branch handling overflow rather than the check itself. If the CPU exposed a way to do checked indexing or numeric operations, it wouldn't be possible to use it in the standard library because it's expected to generate a failure. Most CPUs do expose features like an overflow flag but again, the overhead comes from branching on that value.

cartazio commented 10 years ago

i'd like to chime in that I'd support making int and uint being intptr and uintptr. In my work in GHC Haskell, that Word and Int are tied to system pointer size has been a huge wart, and I've some buyin from the GHC devs to incrementally make that relationship more clear in ghc by having the notions of Pointer sized Ints and Words be more explicit.

An important use case for decoupling int types from pointer size, or at least making it more clear about the relationship between ints and pointers, is providing support for larger than 4gb arrays on 32bit systems (via some manner of adapative memory mapping scheme), as well as providing a "direct addressing" style api for distributed arrays in a super computing context (though in both use cases, any use is probably wrt addressing submatrices)

point being: i support the proposal, and have a nascent similar proposal for Int and Word in haskell (also the system dependent size is a recurrent wart that bites SOOO MANY people)

nmsmith commented 10 years ago

Renaming int and uint makes sense, but if we don't have a 'default' integer type then wouldn't we end up with more casts everywhere because for example, you're using a u32 loop counter and trying to pass it to a function that wants a u64 and another that wants an i32 when neither of those functions really needed to accept those types because their only valid input is numbers from 1 to 10? If I was writing those functions today, I'd have them accept an int because its the 'default' integer type and I'm only interested in a small range of inputs. In truth, they should probably be accepting an arbitrary-size integer type, but I'm hesitant about seeing an integer type as the default that can have potentially-massive performance consequences when used naively.

thestinger commented 10 years ago

@ecl3ctic: If you're using int as a "default", then you're not using it correctly. It will be 16-bit on an architecture with a 16-bit address space, 32-bit or 64-bit. If you're doing your testing on a 64-bit architecture, you're going to miss plenty of bugs.

nmsmith commented 10 years ago

@thestinger: I'd only use it as a default when I know I'm not interested in large values. It would be silly to use an i8 there, wouldn't it? When different architectures have different 'standard' integer sizes (with different performance characteristics for different sizes), what size do you choose for such situations?

thestinger commented 10 years ago

@ecl3ctic: The int type has absolutely nothing to do with some concept like a platform standard "word-size". The x86_64 "word size" is 16-bit per the machine/assembly languages. The int type is simply the same size as the system's pointers.

1fish2 commented 10 years ago

Having recently watched a TED talk by behavioral economist Dan Ariely ( http://www.ted.com/talks/dan_ariely_asks_are_we_in_control_of_our_own_decisions.html -- the default choice has a huge impact) gives a deep appreciation for @ecl3ctic 's call for a default integer type (not the current int type).

It could have a short name or maybe just a strong recommendation to use i32 or BigInt.

thestinger commented 10 years ago

What do you mean by default? When type inference fails, falling back to a fixed-size integer type is asking for trouble.

1fish2 commented 10 years ago

By "default" I meant what programmers should pick when they don't have a good reason to pick something else.

thestinger commented 10 years ago

If you don't know the bounds, it only makes sense to use a big integer. If you do know the bounds, you can use a fixed-size integer. I don't think talking about a 'default' makes sense beyond that.

nmsmith commented 10 years ago

@thestinger If you're only interested in values from 1 to 100, or -10 to 10, what type should you use? You could use 10 different types for the first one, and 5 for the second. That's when you want to reach for a "default".

nmsmith commented 10 years ago

I know very little about compilers or hardware architecture, so humour me here while I lay out a probably fundamentally-flawed idea:

A concept like 'integer ranges' could be used to solve the problem of determining which types to use, how big an integer should be, and whether to check for overflow. If we used the syntax int(<lower bound>, <upper bound>) for this you could do something like this:

// This integer range is implemented as at least an i8/u8, but could be implemented
// as an i32 for efficiency.
fn get(i: int(0, 100)) -> u32 {...}

// This integer range is implemented as the most appropriate signed type
// for the expressions it is used in in the function body (at least an i8).
fn set_slot(i: int(-10, 10)) {...}

fn main() {
    for x in range(0u64, 1000) {
        // The compiler allows passing this u64 argument without a cast,
        // but checks to see if x is in the range (0, 100) before
        // executing the function, and fails if it isn't.
        let s = get(x);
    }
}

struct Vector {
    size: u64
}

impl Vector {
    // ~~Automatic bounds checking~~
    // Because 'size' is unknown at compile-time, argument 'i' will
    // be implemented as a u64 to ensure it works for any value of 'size'
    // However, i is still bounds-checked!
    fn get(&self, i: int(0, self.size) ) {
        ...
    }
}

// Here we want i to be between 0 and num_records, but never greater than 1000.
// The compiler can then determine that i should be an i16 (or larger if it means
// better performance).
fn get_record(i: int(0, num_records, 1000)) {...}

// An overflow-safe square function.
// Wrapping an integer type in int() tells the compiler to check for overflow:
// int(u32) is equivalent to int(0, 0xFFFFFFFF).
// The input parameter should be checked for overflow at the call site when it is
// passed types such as i32, u64 etc (for which overflow is possible).
fn square_root(x: int(u32) ) -> int(u16) {...}

// This function accepts an unbounded range, so use a BigInt:
fn increment(x: int(0, *) ) {...}

This concept almost completely abstracts away hardware-based integer sizes, but still allowing them to be used as the underlying implementation for performance.

If the programmer wants fixed-size/wrapping types/zero overhead then they can continue to use i32, i64 etc.. and if they want a type of unconstrained size they can use a BigInt.

I haven't thought through all the implications of this idea, but I figured I'd just throw it out there.

I also know this is slightly off-track from the actual issue I'm posting under, but integers in Rust seems to be a hot topic at the moment.

thestinger commented 10 years ago

@ecl3ctic: The CPU will typically provide instructions for two's complement arithmetic, so the same instructions are used for signed/unsigned integers (excluding division). You check for signed overflow with a carry flag and unsigned overflow with the actual two's complement overflow flag.

Branching based on the flag can drastically reduce the amount of instruction-level parallelization performed by the CPU, and likely prevents compiler-level auto-vectorization in most cases too. A custom overflow limit on one end will add a comparison operation but not an extra branch while limits on both ends could be significantly slower than just standard overflow checking.

If overflow checks are the default, Rust code not explicitly written to avoid them will typically be slower than Java. In tight loops, there's not much more important than integer and floating point performance. Giving the compiler as much information as possible about aliasing/immutability is all about letting it play more with these inner loops but it will all be for nothing if they're full of unnecessary overflow checking branches.

thestinger commented 10 years ago

I'm all for supporting range integer types like Ada, but I think it's far out of scope for 1.0. There's also a point where the complexity will add too much pain for people writing Rust code, and widespread usage could seriously hurt the success as a systems language. Displacing Ada is aiming for a much smaller piece of the pie than C++ and Rust already has a lot of cognitive overhead for ownership/lifetimes.

nmsmith commented 10 years ago

@thestinger I didn't say range integers or overflow checks should be the default, just that they could be really useful.

Anyway, to return to this issue: I (who have no authority and whom nobody knows) support renaming int and uint to intptr and uintptr because they're not suitable as a "default" integer type :)

What we do about a "default", or an an arbitrary-length integer (if anything), is another issue. int and uint should probably be compiler errors until then.

CloudiDust commented 10 years ago

Hi all,

I like discouraging the use of int and uint, but intptr and uintptr do give the wrong impression of "pointers to integers", not "pointer sized integers". So I propose renaming them to something like intps/uintps or psint/psuint.

Personally I like the former pair, as intps/uintps are in the same "type-size" order like i8, i16 etc, while being “foreign” and "ugly" enough so people will actually look them up before using.