Reuse environment definitions in `ToJSON` and `FromJSON`

xsebek commented 3 months ago

Is your feature request related to a problem? Please describe.

TLDR: the problem is that _envVars inside _envVars cause exponential JSON size.

Take this simple example:

def m2 = m; m end
def m4 = m2; m2 end
def m8 = m4; m4 end
def m16 = m8; m8 end
def m32 = m16; m16 end

And observe the growth of JSON output:	empty	m2	m4	m8	m16	m32	m1024
5	27	78	180	384	792	52200

This makes it impossible to get to other useful parts of robot JSON, like the log, if the program has enough definitions:

2106

Describe the solution you'd like

The definitions should be reused - the inner references should link ({"link": "m8"}) to outer definition.

If we can check that the definitions are the same, we could prune them from inner scope:

- deriving instance FromJSON Env
- deriving instance ToJSON Env
+ instance FromJSONE Env Env
+ instance ToJSON (Env, Env)  -- or some newtype WithEnv

Describe alternatives you've considered

The current derived instance is broken, so maybe we could only keep the top environment, or none at all.

byorgey commented 3 months ago

This may actually be a good application for the (heretofore mythical) ToJSONE. I just want to make sure we avoid doing $O(n^2)$ work to repeatedly check the same definitions for equality.

xsebek commented 3 months ago

I think even $O(n^2)$ would be OK, since it would remove exponentially many inner references.

Outputting JSON is not done on main thread, so it would not freeze the game.

byorgey commented 3 months ago

Good point. Yes, we should not succumb to premature optimization here. Let's start with the simplest thing that works and we can optimize it later if necessary.

byorgey commented 3 weeks ago

After some discussion with @xsebek on Discord, here's a sketch of an idea. Env is just made up of a bunch of Ctx (which almost, but not quite always, have all the same variables). So we can focus first on making this sort of sharing/recovery work for Ctx.

Currently Ctx t is just defined as a newtype for Map Var t. The proposal is to keep this, but add some extra structure that allows us to remember the structure of how the Ctx was built, and quickly identify when two Ctx values are equal, without having to actually compare them:

data CtxStruct t = CtxEmpty | CtxSingle Var | CtxDelete Var (Ctx t) | CtxUnion (Ctx t) (Ctx t)
data Ctx t = Ctx { ctxMap :: Map Var t, ctxName :: CtxName, ctxStruct :: CtxStruct t }

The ctxMap would only be used for looking up variables efficiently, but never for serializing. The ctxName (assuming it is unique enough) can be used to quickly test Ctx values for equality. The ctxStruct explains the structure of how the Ctx was built (either empty, or as a singleton, or by deleting a variable from a Ctx, or as a union of two other Ctx) so that we can disassemble/serialize it effectively.

To generate unique ctxNames, we could either (1) hash the contents of of the context, or (2) require operations to take place in a monad that has a unique-symbol-generation effect.

I had been thinking in terms of storing each tree node indexed by name in a map, or something horrendous like that. It was @xsebek's idea that all we need to do is store some unique names alongside just so we can use them to compare for equality.

When serializing, we can keep track of a set of CtxNames we've seen so far, and essentially output a map from CtxName to one level of CtxStruct - i.e. each name maps to either a single binding, or a pair of CtxNames. To reconstruct a Ctx we read in a map from CtxName to Ctx and build the actual trees + Maps lazily as we go.

xsebek commented 3 weeks ago

@byorgey this sounds great! 👍 Will this also work for type context, or will it use the old version? 🤔

byorgey commented 3 weeks ago

Type contexts also use Ctx, so yes, it will work for those too.

byorgey commented 2 weeks ago

Working on a proof of concept, watch for a PR soon... :smiley:

swarm-game / swarm

Reuse environment definitions in `ToJSON` and `FromJSON` #2107

2106