wc-duck / datalibrary

Open Source Data Library for data serialization.
Other
42 stars 8 forks source link

Support for union enums to be indexed #116

Closed lundmark closed 5 years ago

lundmark commented 5 years ago

Support for unions to either hash names or use member-index for union type enum.

This is default as it used to be, so fully back-compatible. You can now add ""hash_union_type" : false, " to your union and it will be 0/1/2 etc mapped for the members rather than the hashed name.

coveralls commented 5 years ago

Coverage Status

Coverage increased (+0.03%) to 92.276% when pulling a2c9aa0d570ccca3d8b5ee75f53f77cedbaa378b on lundmark:master into cb688675f96aa98c9b2898070c61238c5d9276e9 on wc-duck:master.

lundmark commented 5 years ago

Yes unittests are incoming... unfortunately I had to do it like this because of github.

One of the reasons is that if you have linear enums then the compiler can generate a jump-table instead of if .. else if .. else if etc etc. So for us we have a tight loop where this dramatically reduces performance in that situation.

The other use case is that we sometimes want to use the enum type as index into an array of data that we in runtime want to bind together with the different types stored in the enum.

A third user case is that we want to sort a list depending on the type of the value stored by the union. The order of sorting is then the same as the declaration.

Sure these things could probably be solved in other ways external to DL, but the performance and interfacing cost would then increase towards dl - in a place where we want to decrease the friction with the library.

Hope that makes sense!

lundmark commented 5 years ago

Now we have a unit-test and also it works!

wc-duck commented 5 years ago

It absolutely make sense... However could they always be linear? I don't remember why I made them hashes. I guess that adding types to the union could be handled without rebuilding content if enums we're hashes? But there are a lot of issues with that anyways, such as changing sizes. What I'm getting at is that maybe they should always be linear, thus skipping the flag?

lundmark commented 5 years ago

I don't think that they always should be linear. Having the enum based on the name of the variable is really handy and also a really good safeguard against bad memory overrides (since otherwise 0 will always be valid). I've really liked the hash-way when debugging stuff.

RandyGaul commented 5 years ago

Sounds like linear should be defaulted then? I like the linear option.

wc-duck commented 5 years ago

What does hash gain while debugging? You still define the type-enum right? The only diff being the actual values of the enum?

lundmark commented 5 years ago

The difference is that the union_variable->type is a hash (but yes an enum). But if you take the pure memory into a memory-viewer you can very easily find the 32bit hash-values compared to floats or normal int/uints. The performance miss from switching on an enum as an if-else-if instead of jump-table is not noticable unless you're doing it in the 10's of thousands in a frame, so that was a pretty unique scenario tbh.

lundmark commented 5 years ago

So my point being is that when I've been debugging union issues, it's been really handy because then I've seen when memory-writes has been on the wrong offsets or not. Having those syncpoints in the memory really helped me debug them. Also I think a lot of bugs wouldn't have been detected if it wasn't for it. At least keep the flag and maybe turn it default to index-based?

RandyGaul commented 5 years ago

Hash option definitely still sounds valuable, especially for debugging.

wc-duck commented 5 years ago

Maybe as a compile-time define then? I really want to keep the amount of flags to configure down. In this case my guess is that only a few would understand why the option is there and most will only use the default. And perf-wise one option will always be better.

lundmark commented 5 years ago

I don't really understand the issue with supporting both options.

Having it as a compile-flag means that it will probably be missed in builds and rot and stop working.

Plus, when you have a memory dump from a crash, you can start figuring out what kind of data we're working with just because we have that kind of enum in certain places (we have a pre-parser that sometimes does that with normal enums as well). When you have a crash that's extremely hard to reproduce - or it can only be done in optimized builds, things like this help the possibility of finding out what kind of data it's crashing at.

I don't think that it's a lot of code to maintain both options, and it's not causing any negative effects otherwise and it's also a really helpful feature at times. Why not keep both?

wc-duck commented 5 years ago

The biggest issue I have with this is that it's a user-facing setting that will have to be exposed as a flag, explained in doc:s etc. One idea might be a compromise of hash(union_name) + index?

wc-duck commented 5 years ago

That "should" (need testing) generate good code + stand out in binaries?

RandyGaul commented 5 years ago

We can always default these values. Personally I think having more toggleable things in the settings is a good thing, as it means less obstructive for the user (so long as there are default values so they can spend zero energy configuring if they prefer).

Yes that means they need to be documented, but I think it’s worth it.

lundmark commented 5 years ago

Any ETA on this?

wc-duck commented 5 years ago

Nothing I feel that I can commit to to be honest :(

On Thu, May 23, 2019, 08:30 Simon Lundmark notifications@github.com wrote:

Any ETA on this?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/wc-duck/datalibrary/pull/116?email_source=notifications&email_token=AAC4MORYNO3BII7RON3JDFLPWY2Z3A5CNFSM4HNT27OKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWBGQ3Q#issuecomment-495085678, or mute the thread https://github.com/notifications/unsubscribe-auth/AAC4MOQ7H7LP3YNWTLM2KKLPWY2Z3ANCNFSM4HNT27OA .

wc-duck commented 5 years ago

A fix has been checked in! Union-enums are now linear!