Change underlying storage to an actual union

schveiguy commented 2 years ago

SumType does this and I think because of this, it has more accurate semantics than TaggedUnion, and also needs less machinery around the lifetime issues.

I'm putting this issue report here, because I think it's a good goal to have that I think unlocks some other issues that need fixing, specifically #63 and #44.

s-ludwig commented 2 years ago

Do you have specific semantics in mind or lifetime management that could be simplified? Not having GC-scanned void[] storage in the case that none of the types contain pointers is the only thing that I can think of right now.

But it would probably make the code a bit easier to read in any case. There had been issues with unused union parts not getting overwritten that produced false pointers, but those appeared to have gone, judging by vibe.d's Json implementation, which already switched back to union.

schveiguy commented 2 years ago

Instead of using memory tricks, and having to remember to call things that the compiler does naturally, we can let the compiler just do its thing. I have a suspicion that there are some shortcuts that might not be correct, but I haven't proven it yet. I found it odd that when I just added the full union to the Dummy union, it started complaining about how e.g. opAssign couldn't be @nogc.

It also paves the way for CT usage.

s-ludwig commented 2 years ago

But all the lifetime management (construction, destruction, copy construction/postblit) still needs to be done manually. And opAssign needs a valid value to overwrite, so either a bit-copy of T.init needs to be done in advance, or we need an approach similar to the current emplace one.

(Anyway, I don't really argue against this, we should definitely do it)

schveiguy commented 2 years ago

Looking at std.sumtype, @pbackus is destroying the current value, and then assigning the union by constructing a new one (which I assume blits the proper .init value first). I don't think he's calling any postblit/etc manually: https://github.com/dlang/phobos/blob/9e42570bc28be1452995d288f9667f310df0becd/std/sumtype.d#L620

The ctors look better too: https://github.com/dlang/phobos/blob/9e42570bc28be1452995d288f9667f310df0becd/std/sumtype.d#L358

In general, the machinery just looks much more straightforward and simple.

I'd like to see a similar approach here. Let the compiler do the work it knows how to do. It also gives the compiler more info about what is going on. When the compiler sees void[N] I think it just ignores all lifetime issues.

s-ludwig commented 2 years ago

I don't know, maybe its because I know the code, but I really think the main difference comes down to certain simplifications, such as always doing destroy+blit assign instead of using the copy constructor/postblit of the target type, and not so much to how the storage is defined. I don't see much in terms of actual simplification or compiler assistance (struct alignment definitely is, though).

BTW, I just discovered that I started a WIP branch for this already: https://github.com/s-ludwig/taggedalgebraic/tree/union_storage

schveiguy commented 2 years ago

Simplification should be a goal. If TU is storing a union and not void[N], it should utilize the compiler's in-built abilities to ensure lifetime is done correctly. It may not be automatic, it's still a union, but it should be utilized.

I have no objection to using postblit/copy vs. destroy+blit, so long as it's not a manual process.

I'm reminded of my newaa project where I reimplemented the built in associative arrays as a library type. I don't do any special mechanisms for the Entry struct to simulate what the compiler does naturally. Compare to the actual array runtime, which is insanely complex with all kinds of casts and machinery to replace what the compiler should be doing.

I'm not saying this is a requirement, nor that I have a specific vision in mind. But I'd like to have that kind of "aha, I get how this works" feeling when looking at the code, instead of "I have no idea how this works... this looks like black magic... this looks like a compiler workaround".

BTW, I just discovered that I started a WIP branch for this already

Nice! I'll take a look when I have some time.

s-ludwig commented 2 years ago

I have no objection to using postblit/copy vs. destroy+blit, so long as it's not a manual process.

But it is manual in both cases. I feel like I'm obviously missing something here, but unless I'm terribly mistaken, unions are not performing any lifetime functionality on their own. The differences AFAICS are just automatic alignment, automatic avoidance of GC scanning, as well as direct member access vs. a cast.

schveiguy commented 2 years ago

It has to be done via direct member access. But e.g. this kind of stuff is hard to comprehend why it's needed: https://github.com/s-ludwig/taggedalgebraic/blob/c40e23c45989b67e214c984ab8b2cc90b6e4bf45/source/taggedalgebraic/taggedunion.d#L793-L817

This is where I started trying to fix #63 and where it started complaining about @nogc no longer being valid.

Maybe I'm misunderstanding the complexity. The whole thing is somewhat confusing to me. Why in some cases it's a void[N], and in others it's a union between the first member and void[N]? This makes weird problems happen when your first member is of a certain type (but I suspect there are similar issues with SumType).

It would be nice if the storage is U fields; Kind tag; and that's it.

s-ludwig commented 2 years ago

Okay, but using m_data.trustedGet!T = ...; should be more or less exactly the same as m_data.field_0 = ...; in the union case. The rawEmplace was just a workaround for issues with non-copyable types and bogus complaints about internal pointers by swap (I don't remember the details right now). But generally, this could be replaced by a very similar approach to the the SumType code.

edit: CTFE is another valid point though!

schveiguy commented 2 years ago

Okay, but using m_data.trustedGet!T = ...; should be more or less exactly the same as m_data.field_0 = ...;

But that's not what happens, these are the paths for all construction and opAssign: https://github.com/s-ludwig/taggedalgebraic/blob/c40e23c45989b67e214c984ab8b2cc90b6e4bf45/source/taggedalgebraic/taggedunion.d#L74 https://github.com/s-ludwig/taggedalgebraic/blob/c40e23c45989b67e214c984ab8b2cc90b6e4bf45/source/taggedalgebraic/taggedunion.d#L322

s-ludwig / taggedalgebraic

Change underlying storage to an actual union #72