ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.92k stars 2.55k forks source link

remove u0 from zig #1530

Closed kristate closed 6 years ago

kristate commented 6 years ago

I don't believe that u0 should be a thing. @thejoshwolfe seems to agree:

So it makes more sense for both of those 0-member situations to be compile errors. I can't think of any reason to explicitly make a struct or enum that's completely useless. https://github.com/ziglang/zig/issues/598#issuecomment-343053857

thejoshwolfe commented 6 years ago

u0 can still be created with@IntType(false, 0), so the quote about structs and enums with 0 members doesn't apply.

phase commented 6 years ago

In Rust, structs with 0 members can be used as a global "tag" or marker. One excellent trait about structs is that you can apply traits to them.

trait DoThing {
    fn do_thing(&self) -> u32; 
}

struct X;
struct Y;

impl DoThing for X {
    fn do_thing(&self) -> i32 { 1 }
}

impl DoThing for Y {
    fn do_thing(&self) -> i32 { 2 }
}

fn main() {
    let box: Box<dyn DoThing> = Box::new(X);
    println!(box.do_thing());
}

This is a basic version of what I was doing. I wanted to store these markers and use certain functions on them. Now, my actual code generated the structs & implementations using a macro, but I still have a use case for 0 members in a struct.

I could have implemented this by generating functions instead and storing the function pointer, but that's not as clean.

The Rust docs say:

This is rarely useful on its own (although sometimes it can serve as a marker type), but in combination with other features, it can become useful.

I don't know how well this applies to Zig, as it doesn't have a whole lot of polymorphism yet (iirc), but implementing methods / traits onto these markers was a use case that fit my situation.

andrewrk commented 6 years ago

@kristate what's your explanation for why u0 shouldn't be allowed?

Here's why it should be allowed:

kristate commented 6 years ago

@andrewrk thanks for the thought exercise.

u0 does not make sense, but I thought that was obvious -- let me go through your points.

It makes sense - it's a number that can only hold the value 0

No, u0 is not a type for holding any integer and should not exist. u8 is a type that holds 8 bits. u1 is a type that holds 1 bit. u0 holds zero bits. Therefore, it is not holding anything. This is like saying that we can measure the length of 0 with a zero-length ruler.

We already have the concept of a type that has no bits at runtime, which this fits perfectly

u0 is the equivalent of void and is even backed by void (specifically LLVMVoidType()) in analyze.cpp. I don't like that we are overloading the integer type.

I can imagine that in some generic code where you use @IntType, allowing u0 solves an edge case that otherwise would have needed to be handled separately.

I cannot imagine such an edge case, and if one should occur, I would imagine that this would be a fault in the language design and/or implementation.


Furthermore, after writing my patch (#1531) for this bug in zig, I discovered that LLVM does not support integers of zero bits and that there is an upper limit of (1<<24)-1 bits for which we are not catching and that my patch also fixes.

Likewise, no part of the zig language or standard library requires u0. This was verified in my patch by removing it from the language.

I also searched for a language that allows initialization of a zero-bit-length integer type and could not find one.

In conclusion:

This is not a proposal, but a bug in the language.

winksaville commented 6 years ago

My 2 cents:

  • u0 is not actually an integer type (it's backed by void) and does not hold anything.

assert(@(typeId(u0) == TypeId.Int); // true

  • void already accomplishes what u0 sets out to do. (it's not like you can use u0 for any sort of calculation)

// Works fine var zero: u0 = 0; assert(zero == 0);

// Causes a seg fault in compler and is a bug, if allowed u0 can be used in calculations. var z = @intCast(u1, zero);

(gdb) bt
#0  0x00007f0d307c07a4 in LLVMBuildZExt () from /usr/lib/libLLVM-6.0.so
#1  0x0000559d70b0e84d in gen_widen_or_shorten (g=0x559d71dbd660, want_runtime_safety=true, actual_type=0x559d720272b0, wanted_type=0x559d72029240, expr_val=0x0) at ../src/codegen.cpp:1681
#2  0x0000559d70b1444d in ir_render_widen_or_shorten (g=0x559d71dbd660, executable=0x559d71e347f0, instruction=0x559d720356c0) at ../src/codegen.cpp:3019
#3  0x0000559d70b1ddc0 in ir_render_instruction (g=0x559d71dbd660, executable=0x559d71e347f0, instruction=0x559d720356c0) at ../src/codegen.cpp:5296
#4  0x0000559d70b1e2a5 in ir_render (g=0x559d71dbd660, fn_entry=0x559d71e34690) at ../src/codegen.cpp:5377
#5  0x0000559d70b22744 in do_code_gen (g=0x559d71dbd660) at ../src/codegen.cpp:6373
#6  0x0000559d70b28ff1 in codegen_build_and_link (g=0x559d71dbd660) at ../src/codegen.cpp:8226
#7  0x0000559d70b98c17 in main (argc=3, argv=0x7ffe6f9317b8) at ../src/main.cpp:978
(gdb) 
  • LLVMIntType() only supports from 1 to (1<<24)-1 bits.

A mistake in LLVM?

  • No part of the zig language or standard library requires or uses u0 outside of testing u0.

That could change.

  • I am not aware of any other language that knowingly allows initialization of a zero-bit-length integer type.

Doesn't seem like a strong argument

This is not a proposal, but a bug in the language.

Arguable

andrewrk commented 6 years ago

Just a point of clarity- LLVM does not have the concept of 0-bit types. Zig does. Zig simply avoids emitting instructions for zero bit types, since the values are all compile time known by definition. In the above crashing example, there is missing code in zig to detect that we knew the value of a u0 at compile-time, and so we incorrectly try to use a 0 bit type in LLVM. With the fix, any u0 value would always be comptime known to be zero. This is the same as what we do for structs with all void fields, and enums with only 1 tag.

kristate commented 6 years ago

For what it's worth, I am not against 0-bit types. I am saying that 0-bit integers (a data type that represents some range of mathematical integers) don't exist and are a misnomer. There is no range/integral from zero to zero.

andrewrk commented 6 years ago

It's sound according to mathematics.

kristate commented 6 years ago

@andrewrk how would you store a 0-bit integer?

andrewrk commented 6 years ago

When you store any integer, there is the bits that you place in the storage, and there is the type information (that it is an integer, the size of the integer, that it is stored in twos complement form, whether it is unsigned) which is known but not actually stored.

When you store a u0, you know that there is only one possible value. So it takes up 0 bits of storage. It's brilliant, you don't actually store anything. When you load it, same thing. You know that there is only one value. So when you load it, you just know that the value is 0 and don't actually do any loading.

kristate commented 6 years ago

Well, if we're going to have u0, we should at least implement is properly. printing a u0 does not work:

const std = @import("std");
test "print u0" {
  var uz: u0 = 0;
  std.debug.warn("uz: {}", uz);
}
/Users/cfkk/Source/zig2/build/lib/zig/std/debug/index.zig:46:23: error: compiler bug: var args can't handle void. https://github.com/ziglang/zig/issues/557
    stderr.print(fmt, args) catch return;
                      ^
/Users/cfkk/Source/zig2/build/u0.zig:4:17: note: called from here
  std.debug.warn("uz: {}", uz);
                ^
andrewrk commented 6 years ago

Agreed. I think the only thing that is missing right now is, in zig ir, any value that has the type u0 should always be comptime known to be 0.

The var args bug (#557) will be fixed with #208

kristate commented 6 years ago

@winksaville just for clarification, u0 is definitely backed by void (specifically LLVMVoidType) when interfacing with LLVM. You should check it out here: https://github.com/ziglang/zig/blob/dd5b2d1/src/analyze.cpp#L5930

kristate commented 6 years ago

@andrewrk

  • u0 holds pow(2, 0) different values which is 1

The math checks out. Unfortunately, computers can only store bits, not half bits.

andrewrk commented 6 years ago

I'm sorry I don't follow your argument. Where does a half bit come in to play?

winksaville commented 6 years ago

@kristate, I never doubted you that it was backed by void, but it seems from the users point of view it isn't.

winksaville commented 6 years ago

Should a bug be filed for:

test "u0.crash.intCast" {
    var zero: u0 = 0;
    var z1 = @intCast(u1, zero); // Causes compiler seg fault
}
andrewrk commented 6 years ago

Here are the integers u8, u7, u6, u5, u4, u3, u2, u1, u0 and their relationship with log base 2:

>>> log2(256)
8
>>> log2(128)
7
>>> log2(64)
6
>>> log2(32)
5
>>> log2(16)
4
>>> log2(8)
3
>>> log2(4)
2
>>> log2(2)
1
>>> log2(1)
0

pow(2, 0) == 1 log2(1) == 0

everything is fine

andrewrk commented 6 years ago

I think the confusion here can be demonstrated with an analogy to enums:

const Foo = enum { One };

This enum has 1 possible state. It can only be the value Foo.One. This is 0-bit type in zig, we always know the value is Foo.One. This is analogous to u0, which also has 1 possible state - the integer value 0. In fact, up until a few commits ago, the integer tag type of this enum was u0. I changed it so that you could specify a tag value and have it be a comptime_int, since that's more useful and it works.

const Bar = enum {};

This is a compile error. This type is impossible, because there are zero possible states. Note that this is not analogous to u0, because it has zero possible states, not one.

kristate commented 6 years ago

So, one problem I see is that structs bob and jill would be the same if deserialized and serialized.

You cannot store and retrieve zero bytes.

const bob = struct {
  a: u0,
  b: u0,
};

const jill = struct {
  a: u0,
  b: u0,
  c: u0,
};
andrewrk commented 6 years ago

How is this different than

const bob = struct {
    a: u8,
};

const jill = struct {
    a: u8,
};

either way, you don't know if you got a bob or a jill without metadata.

kristate commented 6 years ago

@andrewrk it's completely different? the members of those structs are the same.

In my example, bob has two members and jill has three.

andrewrk commented 6 years ago

I'll re-open this if there is a sufficiently convincing argument. As it stands I am strongly convinced that u0 should remain a valid integer type in Zig.

kristate commented 6 years ago

This is going to be zig's billion dollar mistake. I thought that we were going to make a language to replace C. Instead of bikeshedding with me, we could have removed a wart on the language. I am truly saddened by this action of closing the thread.

Take some time to watch that video. References to 0-bytes has always been dangerous in computer programming.

andrewrk commented 6 years ago

Accusations of bikeshedding are not on topic. You are welcome to repeat your comment again without the personal attack.

kristate commented 6 years ago

@andrewrk I consider rejecting this issue some sort of personal attack.

I answered your question here with my comment here and your response was not to reply but to reject.

winksaville commented 6 years ago

Here are some current properties of u0:

@sizeof(u0) is 0

assert(@sizeOf(u0) == 0);

Address of zero: u0 is null

var zero: u0 = 0;
assert(&zero == null);

Appears you can dereference a null address

var pZero = &zero;
assert(pZero == null);
assert(pZero.* == 0);

Empty struct behaves like u0:

const Empty = struct {};
var empty: Empty = undefined;
assert(@sizeOf(Empty) == 0);
assert(&empty == null);
Hejsil commented 6 years ago

All zero bit types have the serialization problem (void, struct{}), so unless we remove those as well, removing u0 doesn't solve anything.

Idk exactly the use case for u0 (void is very useful), but I also don't see any potential problems with its design, and there really is no way to find out unless someone shoots themselves with the feature (We're not in 1.0 after all, so let's test out things).

Btw, have you always been able to compare none optional pointers with null (@winksaville example)? Seems like a bug.

andrewrk commented 6 years ago

Seems like a bug.

It is. Thanks @winksaville for the example. I filed #1539

tgschultz commented 6 years ago

For what it is worth, the code I have to read and write arbitrary types doesn't have any trouble writing a u0 without modification (it outputs nothing). The read function fails because var x: u0 = 0; var y = @bitCast(u0, x); crashes the compiler, but otherwise it would just return u0(0) and not actually read any input. This is precisely what I'd expect when reading or writing something like the Bob struct above.

Also of note, x <<= 0 crashes the compiler. If we're going to keep u0, which I agree with so far, we should probably collect these failure cases somewhere.

kristate commented 6 years ago

@Hejsil a void or struct {} is fine because it has zero members. Having members of u0 sprinkled inside of a struct {} seems silly, but could lead to creating two types that could be serialized/hashed identically but are different -- perhaps causing some sort of backwards Collision Attack in the future.

Just like everyone has seen with this type, if it were a real integer type, we would not be having all of these faults. Instead, in order to support u0, we have to do all of these extra hacks on the language to support a type that has no use.

Hejsil commented 6 years ago

@kristate I agreed with the fact that all zero sized types make coder harder to reason about on a machine code level, because, well, there is no machine code to reason about.

I don't see how void and struct{} doesn't have the same problems as u0 for serialization. Just like bob and jill are the same in serialized form, so are:

const kurt = struct { a: void };
const jim = struct {
    a: void,
    b: void
};

and:

const bent = packed struct { a: u8 };
const hue = packed struct {
    a: u4,
    b: u4
};

What u0 allows, is for any generic function that accepts a bit count, to just work without workarounds. No need to have:

const T = if (i == 0) void else @IntType(false, i);
var a: T = if (i == 0) {} else math.MaxInt(T)

This is the same as how void allows std to implement BufSet using HashMap, without HashMap having to special case void.

kristate commented 6 years ago

@Hejsil presumably a serialization library would serialize the void and struct {} types into some sort of representation (in JSON it would be {"a":"null", "b":{}}, but the question remains, how do you serialize u0 ?

const T = if (i == 0) void else @IntType(false, i);

This seems like the right answer, since the programmer is taking care of the special i == 0 case.

This is the same as how void allows std to implement BufSet using HashMap, without HashMap having to special case void.

Yes, I am perfectly fine accepting void. To me u0 is void in disguise and should be accepted as such. It's like we have two voids in the language.

andrewrk commented 6 years ago

A void value does not correspond to null. A void value cannot be null; it can only be {}.

If you were going to serialize a u0 value into JSON, it would simply be 0. If you were deserializing the JSON, it would work perfectly. Either the JSON value is 0 or it's a IntegerOutOfRange error, same as if you were deserializing the JSON value 256 into a u8.

The same JSON deserialization code that works for any integer type would handle u0 in this way. To remove u0 from the language would break this code unnecessarily.

kristate commented 6 years ago

@andrewrk

To remove u0 from the language would break this code unnecessarily.

Just for clarification, how would removing u0 result in any code breakage?

thejoshwolfe commented 6 years ago

I just wrote some code that demonstrates uses of u0 here: https://github.com/ziglang/zig/pull/1543

It's conceivable that a data structure like that could be useful in some situation. (It just occurred to me that I didn't document the idea of the data structure. I'll do that here.)

You have integer keys to a hashtable, but it's not a typical hashtable implementation. It's sharded on the top N bits of the key into a flat array of size 2**N. Each shard holds a some collection of nodes; in the above linked implementation, each shard is a linked list, but it's not important for this discussion. The important part is that the number of bits in the shard key is comptime variable.

A typical case for this data structure might be sharding a u32 key based on the top 8 bits (into 256 shards). But the corner cases are where this gets interesting.

In the case that you shard on the top 0 bits, then the ShardKey type is u0. There is a special case in the code above for this situation, but it's conceivable that smarter shift-by-comptime_int semantics could obviate the special case. And there's still plenty of code that "just works" without special casing for u0 in that implementation.

Another strange case is if you shard on the top 1 bit of a u1 key. In this case you end up right-shifting a u1, and the shift type for u1 is u0.

I admit that the corner cases that demonstrate the use of u0 are not very practical, and the entire data structure is of questionable value. But the purpose of this exercise is getting at whether the language itself should forbid the use of u0. Maybe it's so useless that it should be specifically blocked, but I don't see a reason to forbid it when it fits mathematically into the data structure above. It does add a bit of maintenance burden to the compiler, as seen with the bugs specifically related to u0, but I think the strongest argument in favor of u0 is that it works mathematically.

The one part of u0 that doesn't work mathematically is when you try to shift a u0, then your shift amount should be a log2(0)-bit integer, and log2(0) is undefined. However, doing some testing with zig suggests that the shift type of u0 is u0, which is practically sound. Doing 0 >> 0 should work, and so u0(x) >> 0 i guess should just do nothing.

I think I'm going to write a proposal for some enhancements to shift-by-comptime_int semantics now...

Hejsil commented 6 years ago

Also, under @atomicRmw we have "TODO right now bool is not accepted. Also I think we could make non powers of 2 work fine, maybe we can remove this restriction". If @atomicRmw where to work on u0, then one could write code that is optionally threadsafe without a lot of special casing.

const Lock = if (is_thread_save) u1 else u0;
const Locked = if (is_thread_save) 1 else 0;
const Unlocked = 0;

var lock: Lock = Unlocked;
...
// Do @atomicRmw on lock. @atomicRmw will be no-ops on `u0`

Idk if this is a good idea. Maybe @atomicRmw should never be no-op because that might be confusing with its description "This builtin function atomically modifies memory and then returns the previous value.". u0 has no memory to modify.

Should I open a proposal for this?

Edit: Never written lock free code, so I might be missing details that make this impossible

andrewrk commented 6 years ago

Feel free to do that. Reading your comment though I have to agree that probably atomicrmw should be defined to always load/store memory since that is its main purpose.

kristate commented 6 years ago

@Hejsil @andrewrk Hejsil makes a good point here. These are the kind of mistakes that u0 allows for. @atomicRmw should never be a noop. I wasn't trying to be coy when I was talking about the billion dollar bug NULL -- I am glad that this post caused a flurry of activity and lots of patches, but it seems to me that u0 is in search of a problem where there is none (pun not intended).

I think that u0 is very much in the way of thinking 1/0 = 0. There was an article posted back away on hacker news that seems relevant enough: https://news.ycombinator.com/item?id=17736046

kristate commented 6 years ago

I reread that HN thread and found the comment I was after[0]:

My problem with "1/0 = 0" is that it's essentially masking what's almost always a bug in your program. If you have a program that's performing divide-by-zeroes, that's almost surely something you did not intend for. It's a corner case that you failed to anticipate and plan for. And because you didn't plan for it, whatever result you get for 1/0 is almost surely a result that you wouldn't want to have returned to the user.

When encountering such unanticipated corner cases, it's almost always better to throw an error. That prevents any further data/state corruption from happening. It prevents your user from getting back a bad result which she thinks she can trust and rely on. It highlights the problem very clearly, so that you know you have a problem, and that you have to fix it. Simply returning a 0 does the exact opposite.

If you're one of the 0.1% who did anticipate all this, and correctly intended for your program to treat 1/0 as 0, then just check for this case explicitly and make it behave the way you wanted. The authors of pony are welcome to design their language any way they want. But in this case, they are hurting their users far more than they are helping.

u0 is like the human appendix for zig, it's got limited to no function and is zig-like-enough to @andrewrk that he wants to keep it in. But I think that it is a place for bugs to hide. Hind-sight is 20/20, but being able to do atomic operations on it is weird and should not be allowed -- then, having to create a special case for u0 for every intrinsic/builtin is going to increase compiler maintenance cost like @thejoshwolfe says. Is it worth it?

I still do not know what the clear win is for u0. I thought that it was an implementation hack around and legacy, but @andrewrk seems to think that it's the bee's knees without any explanation.

Hejsil commented 6 years ago

I personally can't come up with runtime vulnerabilities that u0 would introduce. I can think of surprising comptime behavior that a fully working u0 would cause.

test "u0" {
    generic(u0, 0, 0);
}

fn generic(comptime T: type, a: T, b: T) void {
    // Normally, `ux + ux` is a runtime operation if either operands
    // are runtime known. But for `u0`, this should actually not be
    // the case, because `u0 + u0` is always 0. Should the compiler
    // comptime evaluate this expression? If so, then `_ = ssss();`
    // will never be analyzed, but the programmer probably weren't
    // expecting that, because everything in `generic` seems to be
    // runtime known based on the function body alone
    if (a + b != 0) {
        // Currently, this doesn't compile because the compiler is
        // not "smart" enough to comptime eval this away.
        // error: use of undeclared identifier 'ssss'
        _ = ssss();
    }
}

@tgschultz (i think this is MajorLag on IRC) have pointed out many times how it is not clear when something happens at comptime vs runtime. u0 would make this situation even worse because all operations on u0 are comptime known (unlike every other ux).

@andrewrk are you sure closing and rejecting this issue is a good idea, when the design and implementation of u0 are not stable yet? No one wants to play with buggy features, so until u0 works as "intended" it can be hard to test it out and find the footguns and use-cases it might have.

andrewrk commented 6 years ago

@Hejsil

name resolution happens in pass1, so here's an updated example:

test "u0" {
    generic(u0, 0, 0);
}

fn generic(comptime T: type, a: T, b: T) void {
    if (a + b != 0) {
        ssss();
    }
}

fn ssss() void {
    @compileError("this was analyzed");
}

This passes. Here's a similar test that passes, that uses u32:

test "u0" {
    generic(u32, 0);
}

fn generic(comptime T: type, a: T) void {
    if (a < 0) {
        ssss();
    }
}

fn ssss() void {
    @compileError("this was analyzed");
}

There is precedent for expressions to potentially result in a comptime value, and that is fine. The only time it matters whether something is comptime or not is when you need it to be comptime, and in this case you can force it to be with the comptime keyword. So I'm not convinced this is a problem, but regardless, it's certainly not a problem caused by u0.

are you sure closing and rejecting this issue is a good idea, when the design and implementation of u0 are not stable yet?

It seems fine to me - zig already has the concept of comptime operations, and it already has the concept of 0 bit types. u0 fits in like a perfect puzzle piece. There was some missing compiler code in the comptime implementation, but that was also true for enums with 1 tag, void, structs with no fields, etc. It should be pretty stable now.

My take on this issue is that I have been very patient in hearing out the case for removing u0, and the case is weak, and the matter is settled. It's time to move on to other issues.

ChengCat commented 6 years ago

I have no opinion on this subject, just some information and comments.

Many of you are doubting whether u0 is a valid type, and theoretically, it is valid. From a type theory's perspective, u0 is the "unit type", which is always constructible, but carries no information. void, empty tuple, empty struct, enum with a single element are all the same. u0, u1, u2, ... are closely related by "product of types": u1 = u0 onebit, u2 = u1 onebit, ...

For a high-level functional language like Haskell or Ocaml, not including u0 would probably be a mistake. But now the problem is, Zig is a low-level language, and cares about machine representations and such. u0 and u1, u2, ... differs in a fundamental way that u0 has no memory representation. Including u0 in Zig breaks the assumption that all integers have a memory representation, which in turn may break things if handled carelessly.

I am disappointed with the design decision process. I agree with @Hejsil and think @andrewrk has rejected the case too early. In a language with robustness as the top priority, we should have thoroughly investigated for potential pitfalls, before deciding to include u0 in the language.

andrewrk commented 6 years ago

Anyone is welcome to make new arguments and proposals regarding the status quo. The closed status of this issue corresponds to the status quo plan; it does not mean my mind is closed.

So if you want it to be reconsidered, you need to provide either a problematic use case, or a fully formed new proposal.