Closed kristate closed 6 years ago
u0 can still be created with@IntType(false, 0)
, so the quote about structs and enums with 0 members doesn't apply.
In Rust, structs with 0 members can be used as a global "tag" or marker. One excellent trait about structs is that you can apply traits to them.
trait DoThing {
fn do_thing(&self) -> u32;
}
struct X;
struct Y;
impl DoThing for X {
fn do_thing(&self) -> i32 { 1 }
}
impl DoThing for Y {
fn do_thing(&self) -> i32 { 2 }
}
fn main() {
let box: Box<dyn DoThing> = Box::new(X);
println!(box.do_thing());
}
This is a basic version of what I was doing. I wanted to store these markers and use certain functions on them. Now, my actual code generated the structs & implementations using a macro, but I still have a use case for 0 members in a struct.
I could have implemented this by generating functions instead and storing the function pointer, but that's not as clean.
The Rust docs say:
This is rarely useful on its own (although sometimes it can serve as a marker type), but in combination with other features, it can become useful.
I don't know how well this applies to Zig, as it doesn't have a whole lot of polymorphism yet (iirc), but implementing methods / traits onto these markers was a use case that fit my situation.
@kristate what's your explanation for why u0 shouldn't be allowed?
Here's why it should be allowed:
@IntType
, allowing u0 solves an edge case that otherwise would have needed to be handled separately.@andrewrk thanks for the thought exercise.
u0
does not make sense, but I thought that was obvious -- let me go through your points.
It makes sense - it's a number that can only hold the value 0
No, u0
is not a type for holding any integer and should not exist. u8
is a type that holds 8 bits. u1
is a type that holds 1 bit. u0
holds zero bits. Therefore, it is not holding anything. This is like saying that we can measure the length of 0 with a zero-length ruler.
We already have the concept of a type that has no bits at runtime, which this fits perfectly
u0
is the equivalent of void
and is even backed by void
(specifically LLVMVoidType()
) in analyze.cpp
. I don't like that we are overloading the integer type.
I can imagine that in some generic code where you use
@IntType
, allowingu0
solves an edge case that otherwise would have needed to be handled separately.
I cannot imagine such an edge case, and if one should occur, I would imagine that this would be a fault in the language design and/or implementation.
Furthermore, after writing my patch (#1531) for this bug in zig, I discovered that LLVM does not support integers of zero bits and that there is an upper limit of (1<<24)-1
bits for which we are not catching and that my patch also fixes.
Likewise, no part of the zig language or standard library requires u0
. This was verified in my patch by removing it from the language.
I also searched for a language that allows initialization of a zero-bit-length integer type and could not find one.
In conclusion:
u0
is not actually an integer type (it's backed by void
) and does not hold anything.void
already accomplishes what u0
sets out to do. (it's not like you can use u0
for any sort of calculation)LLVMIntType()
only supports from 1
to (1<<24)-1
bits.u0
outside of testing u0
.This is not a proposal, but a bug in the language.
My 2 cents:
u0
is not actually an integer type (it's backed byvoid
) and does not hold anything.
assert(@(typeId(u0) == TypeId.Int); // true
void
already accomplishes whatu0
sets out to do. (it's not like you can useu0
for any sort of calculation)
// Works fine var zero: u0 = 0; assert(zero == 0);
// Causes a seg fault in compler and is a bug, if allowed u0 can be used in calculations. var z = @intCast(u1, zero);
(gdb) bt
#0 0x00007f0d307c07a4 in LLVMBuildZExt () from /usr/lib/libLLVM-6.0.so
#1 0x0000559d70b0e84d in gen_widen_or_shorten (g=0x559d71dbd660, want_runtime_safety=true, actual_type=0x559d720272b0, wanted_type=0x559d72029240, expr_val=0x0) at ../src/codegen.cpp:1681
#2 0x0000559d70b1444d in ir_render_widen_or_shorten (g=0x559d71dbd660, executable=0x559d71e347f0, instruction=0x559d720356c0) at ../src/codegen.cpp:3019
#3 0x0000559d70b1ddc0 in ir_render_instruction (g=0x559d71dbd660, executable=0x559d71e347f0, instruction=0x559d720356c0) at ../src/codegen.cpp:5296
#4 0x0000559d70b1e2a5 in ir_render (g=0x559d71dbd660, fn_entry=0x559d71e34690) at ../src/codegen.cpp:5377
#5 0x0000559d70b22744 in do_code_gen (g=0x559d71dbd660) at ../src/codegen.cpp:6373
#6 0x0000559d70b28ff1 in codegen_build_and_link (g=0x559d71dbd660) at ../src/codegen.cpp:8226
#7 0x0000559d70b98c17 in main (argc=3, argv=0x7ffe6f9317b8) at ../src/main.cpp:978
(gdb)
LLVMIntType()
only supports from1
to(1<<24)-1
bits.
A mistake in LLVM?
- No part of the zig language or standard library requires or uses
u0
outside of testingu0
.
That could change.
- I am not aware of any other language that knowingly allows initialization of a zero-bit-length integer type.
Doesn't seem like a strong argument
This is not a proposal, but a bug in the language.
Arguable
Just a point of clarity- LLVM does not have the concept of 0-bit types. Zig does. Zig simply avoids emitting instructions for zero bit types, since the values are all compile time known by definition. In the above crashing example, there is missing code in zig to detect that we knew the value of a u0 at compile-time, and so we incorrectly try to use a 0 bit type in LLVM. With the fix, any u0 value would always be comptime known to be zero. This is the same as what we do for structs with all void fields, and enums with only 1 tag.
For what it's worth, I am not against 0-bit types. I am saying that 0-bit integers (a data type that represents some range of mathematical integers) don't exist and are a misnomer. There is no range/integral from zero to zero.
u8
holds pow(2, 8)
different values which is 256
u4
holds pow(2, 4)
different values which is 16
u2
holds pow(2, 2)
different values which is 4
u1
holds pow(2, 1)
different values which is 2
u0
holds pow(2, 0)
different values which is 1
It's sound according to mathematics.
@andrewrk how would you store a 0-bit integer?
When you store any integer, there is the bits that you place in the storage, and there is the type information (that it is an integer, the size of the integer, that it is stored in twos complement form, whether it is unsigned) which is known but not actually stored.
When you store a u0, you know that there is only one possible value. So it takes up 0 bits of storage. It's brilliant, you don't actually store anything. When you load it, same thing. You know that there is only one value. So when you load it, you just know that the value is 0 and don't actually do any loading.
Well, if we're going to have u0, we should at least implement is properly. printing a u0 does not work:
const std = @import("std");
test "print u0" {
var uz: u0 = 0;
std.debug.warn("uz: {}", uz);
}
/Users/cfkk/Source/zig2/build/lib/zig/std/debug/index.zig:46:23: error: compiler bug: var args can't handle void. https://github.com/ziglang/zig/issues/557
stderr.print(fmt, args) catch return;
^
/Users/cfkk/Source/zig2/build/u0.zig:4:17: note: called from here
std.debug.warn("uz: {}", uz);
^
Agreed. I think the only thing that is missing right now is, in zig ir, any value that has the type u0
should always be comptime known to be 0.
The var args bug (#557) will be fixed with #208
@winksaville just for clarification, u0
is definitely backed by void
(specifically LLVMVoidType
) when interfacing with LLVM. You should check it out here: https://github.com/ziglang/zig/blob/dd5b2d1/src/analyze.cpp#L5930
@andrewrk
- u0 holds pow(2, 0) different values which is 1
The math checks out. Unfortunately, computers can only store bits, not half bits.
I'm sorry I don't follow your argument. Where does a half bit come in to play?
@kristate, I never doubted you that it was backed by void, but it seems from the users point of view it isn't.
Should a bug be filed for:
test "u0.crash.intCast" {
var zero: u0 = 0;
var z1 = @intCast(u1, zero); // Causes compiler seg fault
}
Here are the integers u8, u7, u6, u5, u4, u3, u2, u1, u0 and their relationship with log base 2:
>>> log2(256)
8
>>> log2(128)
7
>>> log2(64)
6
>>> log2(32)
5
>>> log2(16)
4
>>> log2(8)
3
>>> log2(4)
2
>>> log2(2)
1
>>> log2(1)
0
pow(2, 0) == 1 log2(1) == 0
everything is fine
I think the confusion here can be demonstrated with an analogy to enums:
const Foo = enum { One };
This enum has 1 possible state. It can only be the value Foo.One
. This is 0-bit type in zig, we always know the value is Foo.One
. This is analogous to u0
, which also has 1 possible state - the integer value 0. In fact, up until a few commits ago, the integer tag type of this enum was u0
. I changed it so that you could specify a tag value and have it be a comptime_int
, since that's more useful and it works.
const Bar = enum {};
This is a compile error. This type is impossible, because there are zero possible states. Note that this is not analogous to u0
, because it has zero possible states, not one.
So, one problem I see is that structs bob
and jill
would be the same if deserialized and serialized.
You cannot store and retrieve zero bytes.
const bob = struct {
a: u0,
b: u0,
};
const jill = struct {
a: u0,
b: u0,
c: u0,
};
How is this different than
const bob = struct {
a: u8,
};
const jill = struct {
a: u8,
};
either way, you don't know if you got a bob or a jill without metadata.
@andrewrk it's completely different? the members of those structs are the same.
In my example, bob
has two members and jill
has three.
I'll re-open this if there is a sufficiently convincing argument. As it stands I am strongly convinced that u0 should remain a valid integer type in Zig.
This is going to be zig's billion dollar mistake. I thought that we were going to make a language to replace C. Instead of bikeshedding with me, we could have removed a wart on the language. I am truly saddened by this action of closing the thread.
Take some time to watch that video. References to 0-bytes has always been dangerous in computer programming.
Accusations of bikeshedding are not on topic. You are welcome to repeat your comment again without the personal attack.
Here are some current properties of u0:
@sizeof(u0) is 0
assert(@sizeOf(u0) == 0);
Address of zero: u0
is null
var zero: u0 = 0;
assert(&zero == null);
Appears you can dereference a null address
var pZero = &zero;
assert(pZero == null);
assert(pZero.* == 0);
Empty struct behaves like u0:
const Empty = struct {};
var empty: Empty = undefined;
assert(@sizeOf(Empty) == 0);
assert(&empty == null);
All zero bit types have the serialization problem (void
, struct{}
), so unless we remove those as well, removing u0
doesn't solve anything.
Idk exactly the use case for u0
(void
is very useful), but I also don't see any potential problems with its design, and there really is no way to find out unless someone shoots themselves with the feature (We're not in 1.0 after all, so let's test out things).
Btw, have you always been able to compare none optional pointers with null (@winksaville example)? Seems like a bug.
Seems like a bug.
It is. Thanks @winksaville for the example. I filed #1539
For what it is worth, the code I have to read and write arbitrary types doesn't have any trouble writing a u0 without modification (it outputs nothing). The read function fails because var x: u0 = 0; var y = @bitCast(u0, x);
crashes the compiler, but otherwise it would just return u0(0) and not actually read any input. This is precisely what I'd expect when reading or writing something like the Bob struct above.
Also of note, x <<= 0
crashes the compiler. If we're going to keep u0, which I agree with so far, we should probably collect these failure cases somewhere.
@Hejsil a void
or struct {}
is fine because it has zero members. Having members of u0
sprinkled inside of a struct {}
seems silly, but could lead to creating two types that could be serialized/hashed identically but are different -- perhaps causing some sort of backwards Collision Attack in the future.
Just like everyone has seen with this type, if it were a real integer type, we would not be having all of these faults. Instead, in order to support u0, we have to do all of these extra hacks on the language to support a type that has no use.
@kristate I agreed with the fact that all zero sized types make coder harder to reason about on a machine code level, because, well, there is no machine code to reason about.
I don't see how void
and struct{}
doesn't have the same problems as u0
for serialization. Just like bob
and jill
are the same in serialized form, so are:
const kurt = struct { a: void };
const jim = struct {
a: void,
b: void
};
and:
const bent = packed struct { a: u8 };
const hue = packed struct {
a: u4,
b: u4
};
What u0
allows, is for any generic function that accepts a bit count, to just work without workarounds. No need to have:
const T = if (i == 0) void else @IntType(false, i);
var a: T = if (i == 0) {} else math.MaxInt(T)
This is the same as how void
allows std
to implement BufSet using HashMap
, without HashMap
having to special case void
.
@Hejsil presumably a serialization library would serialize the void
and struct {}
types into some sort of representation (in JSON it would be {"a":"null", "b":{}}
, but the question remains, how do you serialize u0
?
const T = if (i == 0) void else @IntType(false, i);
This seems like the right answer, since the programmer is taking care of the special i == 0
case.
This is the same as how void allows std to implement BufSet using HashMap, without HashMap having to special case void.
Yes, I am perfectly fine accepting void
. To me u0
is void
in disguise and should be accepted as such. It's like we have two void
s in the language.
A void
value does not correspond to null
. A void
value cannot be null
; it can only be {}
.
If you were going to serialize a u0
value into JSON, it would simply be 0
. If you were deserializing the JSON, it would work perfectly. Either the JSON value is 0
or it's a IntegerOutOfRange
error, same as if you were deserializing the JSON value 256
into a u8
.
The same JSON deserialization code that works for any integer type would handle u0
in this way. To remove u0
from the language would break this code unnecessarily.
@andrewrk
To remove u0 from the language would break this code unnecessarily.
Just for clarification, how would removing u0
result in any code breakage?
I just wrote some code that demonstrates uses of u0
here: https://github.com/ziglang/zig/pull/1543
It's conceivable that a data structure like that could be useful in some situation. (It just occurred to me that I didn't document the idea of the data structure. I'll do that here.)
You have integer keys to a hashtable, but it's not a typical hashtable implementation. It's sharded on the top N
bits of the key into a flat array of size 2**N
. Each shard holds a some collection of nodes; in the above linked implementation, each shard is a linked list, but it's not important for this discussion. The important part is that the number of bits in the shard key is comptime variable.
A typical case for this data structure might be sharding a u32 key based on the top 8 bits (into 256 shards). But the corner cases are where this gets interesting.
In the case that you shard on the top 0
bits, then the ShardKey
type is u0
. There is a special case in the code above for this situation, but it's conceivable that smarter shift-by-comptime_int semantics could obviate the special case. And there's still plenty of code that "just works" without special casing for u0
in that implementation.
Another strange case is if you shard on the top 1 bit of a u1
key. In this case you end up right-shifting a u1
, and the shift type for u1
is u0
.
I admit that the corner cases that demonstrate the use of u0
are not very practical, and the entire data structure is of questionable value. But the purpose of this exercise is getting at whether the language itself should forbid the use of u0
. Maybe it's so useless that it should be specifically blocked, but I don't see a reason to forbid it when it fits mathematically into the data structure above. It does add a bit of maintenance burden to the compiler, as seen with the bugs specifically related to u0
, but I think the strongest argument in favor of u0
is that it works mathematically.
The one part of u0
that doesn't work mathematically is when you try to shift a u0
, then your shift amount should be a log2(0)
-bit integer, and log2(0)
is undefined. However, doing some testing with zig suggests that the shift type of u0
is u0
, which is practically sound. Doing 0 >> 0
should work, and so u0(x) >> 0
i guess should just do nothing.
I think I'm going to write a proposal for some enhancements to shift-by-comptime_int semantics now...
Also, under @atomicRmw
we have "TODO right now bool is not accepted. Also I think we could make non powers of 2 work fine, maybe we can remove this restriction". If @atomicRmw
where to work on u0
, then one could write code that is optionally threadsafe without a lot of special casing.
const Lock = if (is_thread_save) u1 else u0;
const Locked = if (is_thread_save) 1 else 0;
const Unlocked = 0;
var lock: Lock = Unlocked;
...
// Do @atomicRmw on lock. @atomicRmw will be no-ops on `u0`
Idk if this is a good idea. Maybe @atomicRmw
should never be no-op because that might be confusing with its description "This builtin function atomically modifies memory and then returns the previous value.". u0
has no memory to modify.
Should I open a proposal for this?
Edit: Never written lock free code, so I might be missing details that make this impossible
Feel free to do that. Reading your comment though I have to agree that probably atomicrmw should be defined to always load/store memory since that is its main purpose.
@Hejsil @andrewrk Hejsil makes a good point here. These are the kind of mistakes that u0
allows for. @atomicRmw
should never be a noop. I wasn't trying to be coy when I was talking about the billion dollar bug NULL
-- I am glad that this post caused a flurry of activity and lots of patches, but it seems to me that u0
is in search of a problem where there is none (pun not intended).
I think that u0
is very much in the way of thinking 1/0 = 0
. There was an article posted back away on hacker news that seems relevant enough: https://news.ycombinator.com/item?id=17736046
I reread that HN thread and found the comment I was after[0]:
My problem with "1/0 = 0" is that it's essentially masking what's almost always a bug in your program. If you have a program that's performing divide-by-zeroes, that's almost surely something you did not intend for. It's a corner case that you failed to anticipate and plan for. And because you didn't plan for it, whatever result you get for 1/0 is almost surely a result that you wouldn't want to have returned to the user.
When encountering such unanticipated corner cases, it's almost always better to throw an error. That prevents any further data/state corruption from happening. It prevents your user from getting back a bad result which she thinks she can trust and rely on. It highlights the problem very clearly, so that you know you have a problem, and that you have to fix it. Simply returning a 0 does the exact opposite.
If you're one of the 0.1% who did anticipate all this, and correctly intended for your program to treat 1/0 as 0, then just check for this case explicitly and make it behave the way you wanted. The authors of pony are welcome to design their language any way they want. But in this case, they are hurting their users far more than they are helping.
u0
is like the human appendix for zig, it's got limited to no function and is zig-like-enough to @andrewrk that he wants to keep it in. But I think that it is a place for bugs to hide. Hind-sight is 20/20, but being able to do atomic operations on it is weird and should not be allowed -- then, having to create a special case for u0
for every intrinsic/builtin is going to increase compiler maintenance cost like @thejoshwolfe says. Is it worth it?
I still do not know what the clear win is for u0
. I thought that it was an implementation hack around and legacy, but @andrewrk seems to think that it's the bee's knees without any explanation.
I personally can't come up with runtime vulnerabilities that u0
would introduce. I can think of surprising comptime behavior that a fully working u0
would cause.
test "u0" {
generic(u0, 0, 0);
}
fn generic(comptime T: type, a: T, b: T) void {
// Normally, `ux + ux` is a runtime operation if either operands
// are runtime known. But for `u0`, this should actually not be
// the case, because `u0 + u0` is always 0. Should the compiler
// comptime evaluate this expression? If so, then `_ = ssss();`
// will never be analyzed, but the programmer probably weren't
// expecting that, because everything in `generic` seems to be
// runtime known based on the function body alone
if (a + b != 0) {
// Currently, this doesn't compile because the compiler is
// not "smart" enough to comptime eval this away.
// error: use of undeclared identifier 'ssss'
_ = ssss();
}
}
@tgschultz (i think this is MajorLag on IRC) have pointed out many times how it is not clear when something happens at comptime vs runtime. u0
would make this situation even worse because all operations on u0
are comptime known (unlike every other ux
).
@andrewrk are you sure closing and rejecting this issue is a good idea, when the design and implementation of u0
are not stable yet? No one wants to play with buggy features, so until u0
works as "intended" it can be hard to test it out and find the footguns and use-cases it might have.
@Hejsil
name resolution happens in pass1, so here's an updated example:
test "u0" {
generic(u0, 0, 0);
}
fn generic(comptime T: type, a: T, b: T) void {
if (a + b != 0) {
ssss();
}
}
fn ssss() void {
@compileError("this was analyzed");
}
This passes. Here's a similar test that passes, that uses u32:
test "u0" {
generic(u32, 0);
}
fn generic(comptime T: type, a: T) void {
if (a < 0) {
ssss();
}
}
fn ssss() void {
@compileError("this was analyzed");
}
There is precedent for expressions to potentially result in a comptime value, and that is fine. The only time it matters whether something is comptime or not is when you need it to be comptime, and in this case you can force it to be with the comptime
keyword. So I'm not convinced this is a problem, but regardless, it's certainly not a problem caused by u0
.
are you sure closing and rejecting this issue is a good idea, when the design and implementation of
u0
are not stable yet?
It seems fine to me - zig already has the concept of comptime operations, and it already has the concept of 0 bit types. u0 fits in like a perfect puzzle piece. There was some missing compiler code in the comptime implementation, but that was also true for enums with 1 tag, void, structs with no fields, etc. It should be pretty stable now.
My take on this issue is that I have been very patient in hearing out the case for removing u0, and the case is weak, and the matter is settled. It's time to move on to other issues.
I have no opinion on this subject, just some information and comments.
Many of you are doubting whether u0 is a valid type, and theoretically, it is valid. From a type theory's perspective, u0 is the "unit type", which is always constructible, but carries no information. void, empty tuple, empty struct, enum with a single element are all the same. u0, u1, u2, ... are closely related by "product of types": u1 = u0 onebit, u2 = u1 onebit, ...
For a high-level functional language like Haskell or Ocaml, not including u0 would probably be a mistake. But now the problem is, Zig is a low-level language, and cares about machine representations and such. u0 and u1, u2, ... differs in a fundamental way that u0 has no memory representation. Including u0 in Zig breaks the assumption that all integers have a memory representation, which in turn may break things if handled carelessly.
I am disappointed with the design decision process. I agree with @Hejsil and think @andrewrk has rejected the case too early. In a language with robustness as the top priority, we should have thoroughly investigated for potential pitfalls, before deciding to include u0 in the language.
Anyone is welcome to make new arguments and proposals regarding the status quo. The closed status of this issue corresponds to the status quo plan; it does not mean my mind is closed.
So if you want it to be reconsidered, you need to provide either a problematic use case, or a fully formed new proposal.
u0
is not actually an integer type (it's backed byvoid
) and does not hold anything.void
already accomplishes whatu0
sets out to do. (it's not like you can useu0
for any sort of calculation)LLVMIntType()
only supports from1
to(1<<24)-1
bits.u0
outside of testingu0
.I don't believe that
u0
should be a thing. @thejoshwolfe seems to agree: