rosefromthedead / effing-mad

Algebraic effects for Rust
586 stars 20 forks source link

Replace frunk with low-overhead coproduct library #4

Open joonazan opened 2 years ago

joonazan commented 2 years ago

I want to try an architecture that uses effect groups with lots of variants but the coproduct offered by frunk uses ridiculous amounts of memory when it has many variants. I am also not convinced that rustc can optimize the methods very well.

Thus I wrote a library for coproducts that are just a tag and a union. With that representation, all operations can be written as tag manipulations; the union's type changes but the in-memory representation does not.

Meanwhile, this library progressed in a nice direction, so I'd rather contribute to this than continue making my own.

In the discussion I had during the development of the library, I heard about your desire for a minimal common superset. I believe it can be done. It requires assigning numbers to effects at compile time, similar to what this crate does. That gives a way to check if two effects are equal. The rest has been done already: frunk's CoproductSubsetter filters out a list of types from a coproduct, just need to make that list the list of effects that are present in both.

rosefromthedead commented 2 years ago

I'm in favour of perf improvements, but it might take me a while to review this. At a glance, your crate seems sound in a lot of places, though. I do wonder why you chose to rename so many of frunk's idents too :)

IMO, what you're doing manually here should be done automatically by rustc... it seems to me that auto enum niche optimisations would benefit a lot more projects than this one. But of course I'm not planning to implement it, so this is a good way to get it done now.

Thanks, and I'll take a look.

Regarding the (mathematical, not C-style) union thing, it would be a great help, but I don't follow your explanation so I'll believe it when I see it :)

joonazan commented 2 years ago

I renamed everything because originally the API was pretty different from frunk. Now the traits are almost the same but the APIs cannot be made compatible with each other because the impls would need to be the same as well. I tried to emulate the standard library's naming conventions for traits. The most significant difference is that I merged CoproductInjector and CoproductUninjector into coproduct::At because they are implemented on exactly the same types.

IMO, what you're doing manually here should be done automatically by rustc

I actually assumed it was done automatically until I measured it. I haven't investigated if there is a good reason for this. However, my guess is that even with good memory layout, LLVM won't be able to perfectly optimize the functions operating on coproducts.

your crate seems sound in a lot of places

I should have added some more comments about soundness. First, casting between compatible repr(C) unions is safe because they always have the same layout for fields. The remaining unsafe blocks are about manipulating the tag correctly. I have made mistakes doing that. I think that part should be tested with a fuzzer or formally proved. I have experience with both but am unaware of a good tool to do the latter for Rust.

Regarding the (mathematical, not C-style) union thing, it would be a great help

Good, I'll develop that next!

joonazan commented 2 years ago

@rosefromthedead Have a look at these tests. https://github.com/joonazan/coproduct/blob/e9b56d405d548510a257212d720e11e467deb9d6/src/merge.rs#L112

Clearly I have created a way to compute the union of two coproducts. Can you help me figure out what the API should look like? How exactly do you want to use the functionality?

I can see that it can be used to infer the type parameters of transform so transform0 and transform1 become unnecessary. I can try doing that. But I think you have a better idea of how exactly this would be used to merge effects.

joonazan commented 2 years ago

Now transform infers its type parameters. But effects need to derive IdType.

The effects! macro can be made to do that but effects declared otherwise need some other solution. It is pretty random for users to have to derive IdType. Maybe derive Effect instead? Or make Effect require IdType?

joonazan commented 2 years ago

It turns out it is possible to do type inequality without a proc macro. My implementation abuses TypeId at compile time, which may get patched out in a future Rust version. https://crates.io/crates/spidermeme does type inequality in a different way, which will also get patched out eventually. But I have high hopes that before that some stable method of type inequality is created.

rosefromthedead commented 1 year ago

All looks good so far - the Merge API looks good as it is, basically if it works it works :)

However I'm currently trying to get to the bottom of why in the effects-macro example, there are so many more instructions in the run function. My first thought was that it would be due to coproduct not having inline annotations where frunk does, but I added them and it barely made a difference. I'll put it through a benchmark to see if it makes a difference in time or just space.

joonazan commented 1 year ago

It should be rather slow in the benchmark. I found that calls to EmptyUnion::idrop and Here::count were not inlined. Those two #[inline] annotations alone halved the size of the main function.

I'm currently investigating if there are other big inlining failures.

Using CopyableCoproduct instead of Coproduct would get rid of all the code involving idrop but then all effects have to be Copy.

joonazan commented 1 year ago

Now the code looks ok in terms of inlining. I can't figure out why some symbols aren't inlined, though. For example, core::ptr::drop_in_place<coproduct::coproduct::Coproduct<coproduct::union::Union<effing_mad::injection::Begin,coproduct::union::EmptyUnion>>> consists of only a ret instruction.

rosefromthedead commented 1 year ago

Thanks, that's a great help. It's a shame the compiler isn't smart enough to see that idrop isn't actually doing anything in monomorphised cases (if that's what you're saying) but it's a good enough sacrifice to not have {Copyable,}Coproduct imo.

joonazan commented 1 year ago

Did you make benchmarks? I suspect that embed and split may be rather slow and could be optimized but I haven't tested that.

joonazan commented 1 year ago

It's a shame the compiler isn't smart enough to see that idrop isn't actually doing anything in monomorphised cases (if that's what you're saying)

I'm more surprised that the the compiler doesn't throw away a symbol that does absolutely nothing.

.section ".text.core::ptr::drop_in_place<coproduct::coproduct::Coproduct<coproduct::union::Union<effing_mad::injection::Begin,coproduct::union::EmptyUnion>>>","ax",@progbits
    .p2align    4, 0x90
    .type   core::ptr::drop_in_place<coproduct::coproduct::Coproduct<coproduct::union::Union<effing_mad::injection::Begin,coproduct::union::EmptyUnion>>>,@function
core::ptr::drop_in_place<coproduct::coproduct::Coproduct<coproduct::union::Union<effing_mad::injection::Begin,coproduct::union::EmptyUnion>>>:

    .cfi_startproc
    ret

    .size   core::ptr::drop_in_place<coproduct::coproduct::Coproduct<coproduct::union::Union<effing_mad::injection::Begin,coproduct::union::EmptyUnion>>>, .Lfunc_end5-core::ptr::drop_in_place<coproduct::coproduct::Coproduct<coproduct::union::Union<effing_mad::injection::Begin,coproduct::union::EmptyUnion>>>

But fortunately there's only a few bytes of this trash.

joonazan commented 1 year ago

BTW I looked at the assembly emitted for uninject, embed and split compared to the assembly for analogous operations on enums. The difference is pretty small but embed and split are big operations even on enums.

I did figure out a way to beat enum on embed and split. If we managed to assign a unique id to every effect at runtime, those ids could be used as the tag of the tagged union. The benefit being that in that representation, embed is just an identity function with a fancy type. Split would require checking which group the active variant belongs in, which is a lot less work than enum .

But I think that optimization can wait for another time. I haven't been able to figure out a good way of assigning those ids.

rosefromthedead commented 1 year ago

I made some small benchmarks to try to stress effect handling, and injections as well in _bidi benches. Here are the results:

test tests::coproduct_sum_bench      ... bench:      23,101 ns/iter (+/- 56)
test tests::coproduct_sum_bidi_bench ... bench:      23,750 ns/iter (+/- 6)
test tests::frunk_sum_bench          ... bench:      23,873 ns/iter (+/- 7)
test tests::frunk_sum_bidi_bench     ... bench:      11,946 ns/iter (+/- 29)

I'm not sure I can conclude anything from this, because it seems like nonsense, but I definitely wouldn't conclude that coproduct is faster :(

The benchmark code is messy and I can't publish it as-is but I'll try to do that next.

joonazan commented 1 year ago

I'm not sure I can conclude anything from this, because it seems like nonsense, but I definitely wouldn't conclude that coproduct is faster :(

I'm currently investigating an issue with the code generated from the drop implementation of Coproduct. I'm almost certain that CopyableCoproduct is always better than frunk. Coproduct on the other hand is a bit flaky in how good code it generates, unfortunately.

I've only tested one function at a time. I'm worried that drop implementation's codegen gets worse in more complicated code.

For example pub fn drop_coprod(c: TestCoprod) {} compiles to just ret when TestCoprod only has Copy types, as expected. But

pub fn uninject_coprod(c: TestCoprod) -> Option<u32> {
    c.uninject().ok()
}

generates code that constains tag comparisons from idrop :cry: Fortunately it manages to elide them when the coproduct has lots of variants.

Note that the example above is using the new version described below. The old version has the same problem but it isn't quite as severe. I'm very interested in seeing how it does in a more complicated situation like your benchmark.

Global tags instead of indices

I realized that the code for embed and split using ordinary enums is very big and that my current approach can only at best match enums. So I figured that I really have to try the more janky solution that I initially discarded.

https://github.com/joonazan/coproduct/tree/global-tags is a version of the library that uses any::TypeId as tags. It works great on embed, which is now almost a no-op and results in the same code for all numbers of variants. But dropping is not great as described above and I'm not sure if I can fix it. I tried removing all unnecessary indirection but it didn't help (https://gist.github.com/joonazan/32fe6627e320325e7182c072ff8461d0).

Then there is the jank that comes with TypeId. TypeId is a hash, so because of the birthday paradox, very large coproducts are at risk of hash collision. Also, TypeId is only implemented for T: 'static. This can be fixed. Either at runtime, via link-time tricks or by requiring the user to annotate types with a proc macro. I'm currently trying to figure out how bad the first two options are.

Offtopic: Another potential performance/safety improvement

Currently, effing-mad has one coproduct for effects and another for their injections. But the same variant should always be active in both. There is a .split().unwrap() that wouldn't be necessary if this was known statically.

In my effect system implementation, I was thinking of solving that by storing a pointer to space for a response in every variant. A simple version of the pattern is seen in genawaiter https://github.com/whatisaphone/genawaiter/blob/45c10c223b92da215e182bc3eff0d5e09bf813f4/src/core.rs#L8

In effing-mad one could replace the effect and injection coproducts with a single coproduct where every variant is something like &mut Result<Effect, Injection>.

Nadrieril commented 11 months ago

I haven't followed everything so ignore me if I'm not making sense, but have you considered using dyn SomeTrait as the runtime representation? It kinda functions like a big enum with variant tags determined magically. If you know the maximum size of your data I think you don't need allocations (at the cost of unsafe shenanigans).

Nadrieril commented 11 months ago

Oh wow there's in fact a crate that does exactly what I had in mind: https://docs.rs/inline_dyn/latest/inline_dyn/struct.InlineDyn.html

joonazan commented 11 months ago

@Nadrieril I already did that in a branch of my repo. The reason I haven't developed the library further is that there is a longstanding compiler bug in Rust that makes complex traits infinitely recurse even though there is a finite solution.

I even looked into fixing it but it would be a pretty large job, as trait resolution is depth-first with some special cases right now. It would have to be changed to breadth-first, iterative depth-first or something and the performance of normal programs would probably suffer. So I'm hoping that the trait solver is revamped entirely at some point.

Nadrieril commented 11 months ago

Oh nice! Well you're in luck, they're in the process of rewriting the trait solver. No idea if it would solve this problem however