explore 'recursion' crate, or at least copy its algorithms

apoelstra commented 2 years ago

See this blog post series: https://recursion.wtf/posts/rust_schemes_3/ which describes a scheme for managing recursive data structures which is likely to be (much) faster than our existing code and also avoid stack overflows.

sanket1729 commented 2 years ago

I will explore this weekend. At the very least, we should be able to copy the algorithms.

apoelstra commented 2 years ago

A couple other related ideas bumping around in my brain:

@JeremyRubin has suggested we have a MiniscriptView/PolicyView etc type which is a tree that we users can use to pattern-match with etc, but which can't be directly constructed (so that you can't construct Miniscript programs that don't typecheck)
Relatedly this HN comment describes a way using arenas that we can use &-pointers rather than Box-pointers which makes matching much much easier.

So on a high level here's what I'm imagining ... basically Policy becomes backed by a Vec of PolicyNodes (the blog post uses "Layer" for what we call "Node"); each PolicyNode is an enum where the variants contain &-refs to other PolicyNodes. These &-refs will point into the same vector; we will use Pin somehow to achieve this. Then we have a .view(&self) -> &PolicyNode method which returns a ref to the topmost PolicyNode which the user can easily match on.

Obviously the devil is in the details :)

apoelstra commented 2 years ago

I think the Layer terminology comes from https://blog.sumtypeofway.com/posts/introduction-to-recursion-schemes.html which the recursion crate is based on, and is probably worth reading in its entirety (even though it's a 5-part series, written in Haskell, and uses Haskell features that don't nicely map to Rust)

apoelstra commented 2 years ago

cc @sipa in case you are interested in any of this (if not feel free to unsubscribe :P)

sipa commented 2 years ago

The C++ miniscript implementation has generic "run a recursive algorithm on a miniscript tree" code (specified in the form of "here is a function to compute an associated piece of data for each node based on its parent's associated data", and a "here is a function to compute the result value of a parent given the associated data and result value of its children"), which seems to work very well (pretty much all recursive functions are implemented using those; including ToString, ToScript, the duplicate key check, finding the smallest badly-typed subtree for error reporting, signing logic, checking satisfiability).

See https://github.com/bitcoin/bitcoin/blob/v24.0rc2/src/script/miniscript.h#L324L408 for generic tree-walking code if you're interested. More code using it in https://github.com/sipa/miniscript/blob/master/bitcoin/script/miniscript.h.

Not sure how relevant this is for the issue here, but it may be an inspiration for algorithms that are free of unbounded stack space usage, especially with tapscript-miniscript potentially increasing maximum recursion depth significantly.

apoelstra commented 2 years ago

Sounds good, @sipa. I think if we continue to use naive recursive data structures this is the way we'll wind up going. Though there are a few limitations -- one is about efficiency, since it's slow to make a zillion tiny allocations/deallocations and also bad for cache locality -- but another, more Rust-specific, is about the ergonomics of matching on recursive structures.

Then there is the issue that we'd like to expose library code to allow construction and pattern-matching on arbitrary miniscripts without allowing people to construct things that don't typecheck (since these will eventually trigger assertions if you try to poke them too much).

apoelstra commented 2 years ago

Adding one more link to this mess of an issue: this reddit post about generalizing over Rc and Arc has an interesting "double-trait" trick which lets you get something like an infinite type which may allow us to create a fixpoint type analogous to the one in the Haskell blog post.

apoelstra commented 2 years ago

Probably relevant: https://smallcultfollowing.com/babysteps/blog/2022/06/27/many-modes-a-gats-pattern/

(Sorry, now I'm just using this issue as a general "how can we better leverage our type system" notepad.)

apoelstra commented 1 year ago

After spending a few hours on this, I think

Abstracting over recursion is cool, but the proposal in the linked blog post still depends on using boxes for recursive structures, which is (a) inefficient and (b) won't let us do things like "view types" that can be easily matched on. What it will do is take a ton of engineering time and make it really hard to later change the backing store to be something flat...
...so, I think we want to do backing store efficiency improvements first
Essentially, what this would involve is allocating a slab for nodes and then storing a raw pointer to this inside of the API-exposed Miniscript structure (and Policy etc would work the same). Annoyingly we have to use a raw pointer ... using a Vec<T>, Box<[T]> or even a *mut [T] all run into limitations related to the "stacked borrows" aliasing rules in Rust, either because things are forbidden or they're simply undefined-as-of-now.
Adding insult to injury, it's hard to do a lot of stuff with raw pointers into slabs with 1.41. In particular the offset_from method on pointers (basically, pointer subtraction) wasn't added til Rust 1.46. So if you want to mess around with pointers -- e.g. to map over all nodes in a tree and reconstituting the pointers in the map output -- you have to cast to integers, do math there, and cast back ... which again runs into poorly-defined areas of stacked borrows.

Basically let's just revisit this after the next MSRV bump, and if we're lucky we'll have a better-defined language by then. Then we should start by defining a generic slab-backed tree type, use these Haskell blog posts as inspiration for what APIs we want to define on that, and then try porting our various trees into it. It wouldn't surprise me if we can eventually 10x the performance of this library while making it easier to add functionality.

As an aside, one way I considered to bypass Rust's memory model is to drop into C :}. But because we're trying to construct complicated Rust enums it would be very difficult to pass these things across the FFI barrier, not to mention the heavy optimization cost we'd face.

apoelstra commented 1 year ago

Relevant (about ergonomically bundling a slab with a recursive type that points into it) https://old.reddit.com/r/rust/comments/zc8pq0/selfreferential_types_for_fun_and_profit/izp774p/

apoelstra commented 2 months ago

This one is also highly relevant to rust-miniscript and its context parameter (and most of my attempts to de-recursive-ify the data structures):

http://ais523.me.uk/blog/scoped-generics.html

rust-bitcoin / rust-miniscript

explore 'recursion' crate, or at least copy its algorithms #484