Open schveiguy opened 2 years ago
Do you have specific semantics in mind or lifetime management that could be simplified? Not having GC-scanned void[]
storage in the case that none of the types contain pointers is the only thing that I can think of right now.
But it would probably make the code a bit easier to read in any case. There had been issues with unused union parts not getting overwritten that produced false pointers, but those appeared to have gone, judging by vibe.d's Json
implementation, which already switched back to union
.
Instead of using memory tricks, and having to remember to call things that the compiler does naturally, we can let the compiler just do its thing. I have a suspicion that there are some shortcuts that might not be correct, but I haven't proven it yet. I found it odd that when I just added the full union to the Dummy union, it started complaining about how e.g. opAssign couldn't be @nogc
.
It also paves the way for CT usage.
But all the lifetime management (construction, destruction, copy construction/postblit) still needs to be done manually. And opAssign
needs a valid value to overwrite, so either a bit-copy of T.init
needs to be done in advance, or we need an approach similar to the current emplace
one.
(Anyway, I don't really argue against this, we should definitely do it)
Looking at std.sumtype, @pbackus is destroying the current value, and then assigning the union by constructing a new one (which I assume blits the proper .init value first). I don't think he's calling any postblit/etc manually: https://github.com/dlang/phobos/blob/9e42570bc28be1452995d288f9667f310df0becd/std/sumtype.d#L620
The ctors look better too: https://github.com/dlang/phobos/blob/9e42570bc28be1452995d288f9667f310df0becd/std/sumtype.d#L358
In general, the machinery just looks much more straightforward and simple.
I'd like to see a similar approach here. Let the compiler do the work it knows how to do. It also gives the compiler more info about what is going on. When the compiler sees void[N]
I think it just ignores all lifetime issues.
I don't know, maybe its because I know the code, but I really think the main difference comes down to certain simplifications, such as always doing destroy+blit assign instead of using the copy constructor/postblit of the target type, and not so much to how the storage is defined. I don't see much in terms of actual simplification or compiler assistance (struct alignment definitely is, though).
BTW, I just discovered that I started a WIP branch for this already: https://github.com/s-ludwig/taggedalgebraic/tree/union_storage
Simplification should be a goal. If TU is storing a union and not void[N], it should utilize the compiler's in-built abilities to ensure lifetime is done correctly. It may not be automatic, it's still a union, but it should be utilized.
I have no objection to using postblit/copy vs. destroy+blit, so long as it's not a manual process.
I'm reminded of my newaa project where I reimplemented the built in associative arrays as a library type. I don't do any special mechanisms for the Entry
struct to simulate what the compiler does naturally. Compare to the actual array runtime, which is insanely complex with all kinds of casts and machinery to replace what the compiler should be doing.
I'm not saying this is a requirement, nor that I have a specific vision in mind. But I'd like to have that kind of "aha, I get how this works" feeling when looking at the code, instead of "I have no idea how this works... this looks like black magic... this looks like a compiler workaround".
BTW, I just discovered that I started a WIP branch for this already
Nice! I'll take a look when I have some time.
I have no objection to using postblit/copy vs. destroy+blit, so long as it's not a manual process.
But it is manual in both cases. I feel like I'm obviously missing something here, but unless I'm terribly mistaken, union
s are not performing any lifetime functionality on their own. The differences AFAICS are just automatic alignment, automatic avoidance of GC scanning, as well as direct member access vs. a cast
.
It has to be done via direct member access. But e.g. this kind of stuff is hard to comprehend why it's needed: https://github.com/s-ludwig/taggedalgebraic/blob/c40e23c45989b67e214c984ab8b2cc90b6e4bf45/source/taggedalgebraic/taggedunion.d#L793-L817
This is where I started trying to fix #63 and where it started complaining about @nogc
no longer being valid.
Maybe I'm misunderstanding the complexity. The whole thing is somewhat confusing to me. Why in some cases it's a void[N], and in others it's a union between the first member and void[N]? This makes weird problems happen when your first member is of a certain type (but I suspect there are similar issues with SumType).
It would be nice if the storage is U fields; Kind tag;
and that's it.
Okay, but using m_data.trustedGet!T = ...;
should be more or less exactly the same as m_data.field_0 = ...;
in the union case. The rawEmplace
was just a workaround for issues with non-copyable types and bogus complaints about internal pointers by swap
(I don't remember the details right now). But generally, this could be replaced by a very similar approach to the the SumType
code.
edit: CTFE is another valid point though!
Okay, but using
m_data.trustedGet!T = ...;
should be more or less exactly the same asm_data.field_0 = ...;
But that's not what happens, these are the paths for all construction and opAssign: https://github.com/s-ludwig/taggedalgebraic/blob/c40e23c45989b67e214c984ab8b2cc90b6e4bf45/source/taggedalgebraic/taggedunion.d#L74 https://github.com/s-ludwig/taggedalgebraic/blob/c40e23c45989b67e214c984ab8b2cc90b6e4bf45/source/taggedalgebraic/taggedunion.d#L322
SumType does this and I think because of this, it has more accurate semantics than TaggedUnion, and also needs less machinery around the lifetime issues.
I'm putting this issue report here, because I think it's a good goal to have that I think unlocks some other issues that need fixing, specifically #63 and #44.