Introduce a Poly wrapper struct and Func trait for order-free mapping

lloydmeta / frunk

Funktional generic type-level programming in Rust: HList, Coproduct, Generic, LabelledGeneric, Validated, Monoid and friends.

https://beachape.com/frunk/

MIT License

1.29k stars 58 forks source link

Introduce a Poly wrapper struct and Func trait for order-free mapping #106

Closed lloydmeta closed 6 years ago

lloydmeta commented 6 years ago

Overview

This is just a PoC for discussion to garner feedback, thoughts, etc :)

Motivation

We already have a .map on HList that works by passing another HList of transformation functions for each element.

It might also be convenient map by simply having transformation functions for every type in the HList. That way

Order of declaration of the transformation functions doesn't matter
Remove the need to specify the same transformation more than once.

This PoC PR attempts to make this happen, introducing the following

let h = hlist![9000, "joe", 41f32, "schmoe", 50];

struct P;

impl Func<i32> for P {
    type Output = bool;
    fn call(args: i32) -> Self::Output {
        args > 100
    }
}
impl <'a> Func<&'a str> for P {
    type Output = usize;
    fn call(args: &'a str) -> Self::Output {
        args.len()
    }
}
impl Func<f32> for P {
    type Output = String;
    fn call(args: f32) -> Self::Output {
        format!("{}", args)
    }
}

assert_eq!(h.map(Poly(P)), hlist![true, 3, "41".to_string(), 6, false]);

The Poly is so we don't clash with existing impl for mapping over a single-element HList with a function.

Changes

Introduce a Poly wrapper struct and Func trait
Add implementation of HMapper for when a given type implemenents Func for all types inside an HList and is passed to .map while wrapped in a Poly

ExpHP commented 6 years ago

Nice! I was working on something very similar to this in my own test repo (and actually use stuff like this in my personal projects) and was planning to eventually propose adding it to frunk.

You actually just caught me in the middle of working on a crate that will make it easier to derive impls for all three Fn traits at once (i.e. derive FnOnce and FnMut from an Fn impl) if we want to support the full Fn heirarchy. (basically, I'm upgrading this old piece of trash into a more reliable proc macro, and the trait/method/assoc-ty names are configurable so that it can impl anything that resembles the std Fn heirarchy)

In my own projects I haven't needed a newtype wrapper like Poly; I'll need to look more closely at the differences between our designs to figure out why.

ExpHP commented 6 years ago

Ah, actually, the reason I didn't need Poly is simple; I always give my traits blanket impls for std closures.

impl<F, A, R> FunOnce<HList![A]> for F
where F: FnOnce(A) -> R;

impl<F, A, B, R> FunOnce<HList![A, B]> for F
where F: FnOnce(A, B) -> R;

// and etc., all generated using a macro

and then I would use FunOnce, FunMut and Fun exclusively in my where bounds, rather than the std Fn traits

Note mine took HLists, which is interesting because it makes it trivial to write functions that support variadic closures (up to a fixed number of arguments), but it's also a PITA to keep having to remember that Args always needs to be wrapped in an hlist.

Also note a disadvantage of my design is that you lose the ability to do things like

// NOTE: incompatible with the blanket impls for F: Fn
impl<'a, F, Args> Fun<Args> for &'a F
where F: Fun<Args>{ ... }

meaning a generic function that takes F: Fun can't call another generic function that takes G: Fun without moving it by giving it &F. So I had to work around this by adding methods to Fun that return a newtype wrapper around &Self:

trait Fun<Args>: FunMut<Args> {
    fn call(&self, args: Args) -> Self::Output;

    // use this to reborrow a Fun to give it to a function that
    // takes some generic F: Fun by value.
    fn by_ref(&self) -> ByRef<Self>;
}

struct ByRef<'a, F>(&'a F);

impl<'a, F, Args> Fun<Args> for ByRef<'a, F>
where F: Fun<Args> { ... }

Centril commented 6 years ago

So I'm a bit skeptical around use cases that this will facilitate; but if there are, then this seems like a neat idea.

We could simplify things a bit like so:

pub trait Func<In> {
    type Output;

    fn call(i: In) -> Self::Output;
}

pub struct Poly<T>(pub T); 

macro_rules! poly_fn {
    ($([$($pars: tt)*] |$args: ident : $arg_typ: ty|
        -> $ret_typ: ty { $body: expr }),*)
    => {{
        struct F;
        $(
            impl<$($pars)*> Func<$arg_typ> for F {
                type Output = $ret_typ;
                fn call($args: $arg_typ) -> Self::Output { $body }
            }
        )*
        Poly(F {})
    }};
}

fn main() {
    let hlist2 = hlist1.map(poly_fn!(
        // We'd like to omit [] here.
        [] |x: i32| -> bool { x > 100 },
        // And here.
        [] |x: f32| -> String { format!("{}", x) },
        // We'd like to infer the need for <'a> quantification from &'a str alone.
        ['a] |x: &'a str| -> usize { x.len() },
        // We'd like to omit '-> u8' when the input and output types align.
        [] |x: u8| -> u8 { x + 1 }
    ));
}

The macro could be improved but it might require proc macros to do it really well.

lloydmeta commented 6 years ago

@ExpHP That's an interesting approach; I'll have to think a bit more between the tradeoffs there vs having to wrap a given unit struct in Poly once.

@Centril

So I'm a bit skeptical around use cases that this will facilitate; but if there are, then this seems like a neat idea.

Yeah, it does look a bit weird doesn't it?

For .map, having something like this seems to make sense for a few reasons:

As said before, A bit more DRY and removes order dependence
@ExpHP has implemented + is using something like this https://github.com/lloydmeta/frunk/pull/106#issuecomment-377537011
It's the default way for mapping over HLists in Shapeless, so it's something people coming Scala might reach for and expect to be there 😆

I'm a bit skeptical about whether this makes sense for folding methods, like .foldl and .foldr, because those are inherently order-dependant, and I've found it fairly hard to wrap my head around the Poly-style folding whenever I've used it in Shapeless.

We could simplify things a bit like so: /snip The macro could be improved but it might require proc macros to do it really well.

I really like your macro proposal. It does a great job of eliminating a bunch of boilerplate :)

It seems possible to get rid of the [] when there are no type params, by using some clever accumulation scheme and then emitting the implementations later by dumping it into macro (or pattern).

ExpHP commented 6 years ago

@Centril something like this is a key building block to simulating higher order abstractions in rust. It is a solution to the problem that rust has no generic closures. That said, the amount of value that it provides depends on the number of container types it can be used with.

In my code base, I have the concept of a "structure-of-arrays"-style type. For instance, the type HList![Vec<A>, Vec<B>, Option<Vec<C>]. The types Vec<T> and [T] are considered to be the base-case "single array" types, and anything else like HCons or Option is treated as some sort of combinator type. More combinators are always appearing as a result of defining newtypes and whatnot, and not all of the vectors they store are necessarily part of the SOA, (so Generic is no good here)

In my code, I had garbage like the following for defining a permute operation that would apply a permutation to all of the vectors in lockstep. (each of these impls were 4-5 lines of the most boring boilerplate imaginable)

impl<X>                      Permute for Vec<X> { ... }
impl                         Permute for HNil { ... }
impl<A: Permute, B: Permute> Permute for HCons<A, B> { ... }
impl                         Permute for CNil { ... }
impl<A: Permute, B: Permute> Permute for Coproduct<A, B> { ... } 
impl<A: Permute>             Permute for Option<A> { ... }
impl<A: Permute, E>          Permute for Result<A, E> { ... }
impl<A: Permute, B: Permute> Permute for Either<A, B> { ... }
impl<V: Permute>             Permute for CoordsKind<V> { ... } 
impl<V: Permute, Z: Permute> Permute for StructureOrCoords<V, Z> { ... } 
impl                         Permute for CoordsItIs { ... } 
impl<M: Permute>             Permute for StructureItIs<M> { ... } 
impl                         Permute for Basis { ... }
impl                         Permute for Basis3 { ... }
impl                         Permute for Ket3 { ... }

And I had similar nonsense for defining an as_refs operation (for turning all the Vecs into slices) and a to_vecs() operation (for copying everything into new Vecs), and various other operations that fit the mold of "do the same thing to every vector".

Using unboxed generic closures like the following:

struct PermuteFunc;
struct AsRefsFunc;
struct ToVecsFunc;

impl<T: Permute> Fn<T> for PermuteFunc { ... }
impl<T: AsRefs> Fn<T> for AsRefsFunc { ... }
impl<T: ToVecs> Fn<T> for ToVecsFunc { ... }

and writing two traits for "mapping" SoA structures (one by value, one by ref), I was able to turn this problem of M x N impls into one of M + N impls.

So that's how things are in the world of applications, where the problems to be solved grow without bound. How about for a library like frunk?

Well, in this case, the only work you are saving (compared to manually implementing a separate trait on HLists for each operation) are the impls on HNil and HCons. Truth be told, that's not much. We could also conceivably have impls on CNil and Coproduct, which would save us a bit more.

To be honest, I think what would be more useful would simply be to have (a) a common abstraction for this sort of thing (if only the traits in std were it!), and (b) a library of useful implementors available for anyone to use, like a Clone function or the Id function or a function composition operator. It doesn't even need to live in frunk, though frunk could depend on the traits.

ExpHP commented 6 years ago

Re: fold. I had a fold for polymorphic F on HLists at some point. I thought to myself, "awww yeah this is going to save me so much work now that I don't need to write a helper trait whenever I need an accumulator."

It then took me something like 45 minutes to actually successfully write an implementation of Reverse using it. Then I spent hours on a function that tried to and-reduce an HList of type-level Options (Some<T>s and None<U>s) into a single type-level Option containing an HList.

...then I threw it all out and rewrote the functions using handcoded traits so that I could actually begin breathing again. The human brain has not yet evolved to handle polymorphic folds.

lloydmeta commented 6 years ago

@Centril I managed to make a macro that can handle lack of empty brackets (the macro itself is quite ugly though) :) I'm not sure if we can do type-annotation elusion w/o procedural macros though :D

let h = hlist![9000, "joe", 41f32, "schmoe", 50];
let h2 = h.map(poly_fn!(
    |x: i32| -> bool { x > 100 },
    |x: f32| -> String { format!("{}", x) },
    ['a] |x: &'a str| -> usize { x.len() }
));
assert_eq!(h2, hlist![true, 3, "41".to_string(), 6, false]);

@Centril

Truth be told, that's not much. We could also conceivably have impls on CNil and Coproduct, which would save us a bit more.

I added a CoproductFoldable implementation that uses Poly, and the usage, without macros, looks like this

type I32F32StrBool = Coprod!(i32, f32, bool);

impl Func<i32> for P {
    type Output = bool;
    fn call(args: i32) -> Self::Output {
        args > 100
    }
}
impl Func<bool> for P {
    type Output = bool;
    fn call(args: bool) -> Self::Output {
        args
    }
}
impl Func<f32> for P {
    type Output = bool;
    fn call(args: f32) -> Self::Output {
        args > 9000f32
    }
}
struct P;

let co1 = I32F32StrBool::inject(3);
let folded = co1.fold(Poly(P));

assert_eq!(folded, false);

ExpHP commented 6 years ago

I've been thinking a bit more about this, and I think now that our goals here actually differ quite a bit. I think that initially when I saw you adding a custom Func trait, I jumped to a lot of conclusions!

To be clear, the specific thing that I see a use case for is the ability to use HMappable with a Fn-like trait that users are capable of manually implementing.

I am not confident about the use cases for order-free mapping specifically, and personally, I probably wouldn't use poly_fn! very much since pretty much all uses of custom Func traits in my own code are wrappers around generic functions, which is simply not the use case that it is optimized for. (this is why I was taking about things like where-bound support and etc.)

Edit: That was too negative. Here's the positive: I do think it is an innovative solution to overcoming the order-based requirements of map, which successfully brings it in line with many other features of frunk that perform type-directed lookup. It's just that I see Func as The Bigger Feature.™

lloydmeta commented 6 years ago

@ExpHP

No worries at all. I think we're in agreement here about Func being more of a main, more general thing than the macro, poly_fn. The latter was added primarily to make the ergonomics easier for simple cases, which I think is important as well from a user-friendliness perspective.

In fact, I would summarise this PR is almost entirely an exercise in increasing ergonomics, in particular around .map on Hlist and .fold on Coproducts #101. After all, I haven't actually changed any of the existing traits and implementations, and any user could have introduced something like Func and added their own impls of HMappable based on it :)

In other words, this PR is an attempt to make life easier by:

Providing a generalised Func trait
Provide implementations of HMappable and CoproductFoldable based on Func out of the box so users can use them with their types that implement Func
Provide a macro that uses 1 and 2 to cut down on boilerplate in the simple cases.

lloydmeta commented 6 years ago

Crumbs; I'm trying to implement mapping over a reference HList but ran into overflow


impl<'a, P, H, Tail> HMappable<Poly<P>> for &'a HCons<H, Tail>
where
    P: Func<&'a H>,
    &'a Tail: HMappable<Poly<P>>,
{
    type Output = HCons<<P as Func<&'a H>>::Output, <&'a Tail as HMappable<Poly<P>>>::Output>;
    fn map(self, poly: Poly<P>) -> Self::Output {
        let HCons { ref head, ref tail } = *self;
        HCons {
            head: P::call(head),
            tail: tail.map(poly)
        }
    }
}

Any ideas?

ExpHP commented 6 years ago

Ugh. That bug always strikes where you least expect it. And as always, I can't even begin to comprehend why it occurs for this impl specifically and not others...

To be honest, my strategy in personal projects has typically been that I define as_ref (or a similar name) as a function that turns Type<A, B, ..., Z> into Type<&'a A, &'a B, ..., &'a Z> (clarification: I am not suggesting to produce HCons<&A, &B>, but rather to produce Hlist![&A, &B, ..., &Z]), so that it composes more easily/orthogonally to other functionality. The advantage here would be that you would no longer need any where bounds of the form &'a Tail: Type, which are what seem to have the greatest chance of triggering the bug.

That said, I'm not sure how much trouble such a definition of as_ref causes for optimization. (I usually make it #[inline(always)] to help keep it from blocking optimizations, but this probably produces a lot of work for LLVM. I don't know/haven't measured it)

lloydmeta commented 6 years ago

That said, I'm not sure how much trouble such a definition of as_ref causes for optimization. (I usually make it #[inline(always)] to help keep it from blocking optimizations, but this probably produces a lot of work for LLVM. I don't know/haven't measured it)

I think it should be fine in terms of optimisation; in my experience, the Rust compiler is fairly smart. I added a benchmark in #110 and got the following results

test hlist_mapping_consuming     ... bench:           0 ns/iter (+/- 0)
test hlist_mapping_non_consuming ... bench:           0 ns/iter (+/- 0)

ExpHP commented 6 years ago

Ah, I missed that you posted those results. That benchmark is very suspicious! Clearly the expression was simply optimized away as dead code. (since it is ignored with a semicolon)

lloydmeta commented 6 years ago

Ugh!

You're right; it was suspicious but I couldn't figure out why. Removed the semicolon and now we've got:

test hlist_mapping_consuming     ... bench:           1 ns/iter (+/- 0)
test hlist_mapping_non_consuming ... bench:           1 ns/iter (+/- 0)

lloydmeta commented 6 years ago

@ExpHP @Centril Are you comfortable with the current state of the implementation? Anything missing (e.g. docs, tests) ?

Centril commented 6 years ago

It LGTM 👍 :shipit: