Open xsebek opened 3 months ago
This may actually be a good application for the (heretofore mythical) ToJSONE
. I just want to make sure we avoid doing $O(n^2)$ work to repeatedly check the same definitions for equality.
I think even $O(n^2)$ would be OK, since it would remove exponentially many inner references.
Outputting JSON is not done on main thread, so it would not freeze the game.
Good point. Yes, we should not succumb to premature optimization here. Let's start with the simplest thing that works and we can optimize it later if necessary.
After some discussion with @xsebek on Discord, here's a sketch of an idea. Env
is just made up of a bunch of Ctx
(which almost, but not quite always, have all the same variables). So we can focus first on making this sort of sharing/recovery work for Ctx
.
Currently Ctx t
is just defined as a newtype
for Map Var t
. The proposal is to keep this, but add some extra structure that allows us to remember the structure of how the Ctx
was built, and quickly identify when two Ctx
values are equal, without having to actually compare them:
data CtxStruct t = CtxEmpty | CtxSingle Var | CtxDelete Var (Ctx t) | CtxUnion (Ctx t) (Ctx t)
data Ctx t = Ctx { ctxMap :: Map Var t, ctxName :: CtxName, ctxStruct :: CtxStruct t }
The ctxMap
would only be used for looking up variables efficiently, but never for serializing. The ctxName
(assuming it is unique enough) can be used to quickly test Ctx
values for equality. The ctxStruct
explains the structure of how the Ctx
was built (either empty, or as a singleton, or by deleting a variable from a Ctx
, or as a union of two other Ctx
) so that we can disassemble/serialize it effectively.
To generate unique ctxName
s, we could either (1) hash the contents of of the context, or (2) require operations to take place in a monad that has a unique-symbol-generation effect.
I had been thinking in terms of storing each tree node indexed by name in a map, or something horrendous like that. It was @xsebek's idea that all we need to do is store some unique names alongside just so we can use them to compare for equality.
When serializing, we can keep track of a set of CtxName
s we've seen so far, and essentially output a map from CtxName
to one level of CtxStruct
- i.e. each name maps to either a single binding, or a pair of CtxName
s. To reconstruct a Ctx
we read in a map from CtxName
to Ctx
and build the actual trees + Map
s lazily as we go.
@byorgey this sounds great! 👍 Will this also work for type context, or will it use the old version? 🤔
Type contexts also use Ctx
, so yes, it will work for those too.
Working on a proof of concept, watch for a PR soon... :smiley:
Is your feature request related to a problem? Please describe.
TLDR: the problem is that
_envVars
inside_envVars
cause exponential JSON size.Take this simple example:
This makes it impossible to get to other useful parts of robot JSON, like the log, if the program has enough definitions:
2106
Describe the solution you'd like
The definitions should be reused - the inner references should link (
{"link": "m8"}
) to outer definition.If we can check that the definitions are the same, we could prune them from inner scope:
Describe alternatives you've considered
The current derived instance is broken, so maybe we could only keep the top environment, or none at all.