Open greghendershott opened 3 years ago
In my experience, to do anything useful with fully-expanded code you need the namespace in which it was originally expanded. Which means the only "cache" that works is in-memory -- the namespace and the expand syntax, both. But that chews through memory very quickly. (e.g. https://github.com/greghendershott/racket-mode/issues/512)
As a result I've shifted to thinking about ways to run all the analyses up-front, eagerly, and save those interesting result in an on-disk database. Which is what I started to experiment with in https://github.com/greghendershott/pdb (savings definitions and references).
If indeed it's possible (or could become possible) to fully serialize the fully-expanded code and other information from the namespace and/or module registry? That would be great!
If not, then we could also explore a way for tools to "register" a "hook" to be called when expansion of a file is complete. The hook would get the syntax object for the fully-expanded code, plus the namespace. Each tool could then do whatever analysis it wants to do, and serialize the results however it wants.
It would be possible to serialize syntax objects in a way that preserves bulk bindings — at the expense of not sharing the exporting module's information when the syntax object are deserialized, but non-sharing may be what you want here.
I imagine that non-preserved syntax properties like 'origin
are also an issue. That seems a little tricker, since some non-perserved syntax properties are probably non-serializable. If the serialization function took a set of non-perserved keys to treat as preserved, would that work?
It would be possible to serialize syntax objects in a way that preserves bulk bindings — at the expense of not sharing the exporting module's information when the syntax object are deserialized, but non-sharing may be what you want here.
That sounds good.
I don't understand what "the exporting module's information" means, so I don't know whether to think that's good or bad.
I imagine that non-preserved syntax properties like 'origin are also an issue. That seems a little tricker, since some non-perserved syntax properties are probably non-serializable.
A quick glance at traversals.rkt shows it uses a half dozen or so syntax properties. I don't know how many of those are serializable.
If the serialization function took a set of non-perserved keys to treat as preserved, would that work?
I think so?
I'm not sure how to handle ones that turn out to be non-serializable. Maybe there needs to be required-keys
which if non-serializable raises an exception, and optional-keys
where it just skips --- or something like that?
I'm just guessing there might be uses where it's acceptable to proceed with some missing. (I'm not sure if that applies to drracket/check-syntax
; @rfindler knows better if an "incomplete" analysis is better than nothing -- or worse than nothing. But I think Robby's idea was to build something that could also support other uses.)
I'm not sure how to handle ones that turn out to be non-serializable. Maybe there needs to be
required-keys
which if non-serializable raises an exception, andoptional-keys
where it just skips --- or something like that?
Not to over-think this, but I can imagine values that aren't serializable -- but a function could transform them into a value that is. The substitute value might be "impoverished", but it might be better than nothing, and enough to support some use case.
So maybe the "ideal" would be something in the spirit of the not-found
argument to hash-ref
. But in this case, not-serializable
.
Like I said, maybe over-thinking it.
It surely seems like this library could publish which properties it serializes and we could add more over time, as they became needed/useful. That's at least a minimum choice that sounds workable. There may be better choices tho.
I think all of the syntax properties that check syntax currently uses are serializable.
@rfindler Some of the syntax property values are identifiers. A piece of syntax that is identifier?
can be serialized. But do we know "how much" of an identifier (in a syntax property value) is serialized by compile
, and is recovered by the (eval (read __))
deserialization?
I'm wondering about information about an identifier beyond its symbol datum and srcloc --- things like scope, and operations like comparing identifiers for equality or giving them to identifier-binding
.
(I'm not claiming it won't or can't work. I have a fuzzy understanding of what's involved. I'm genuinely asking, to double-check.)
Serialization currently preserves all of that, except for "bulk bindings", which are included only by reference to a providing module. So, a key piece is keeping bulk bindings with the serialized object instead of just a reference to the module.
What's an idea of what's inside a "bulk binding"? Is it like when I call identifier-binding
on an identifier, the answer might be in a "bulk binding" if it is an imported identifier and so serialization (without us doing something special) would lose that?
If so, given what the rest of the code in this repo is doing, it may make sense for us to maintain a similar kind of reference (since anytime we have a fully expanded thing of some module we also have all its imports too). Not sure how this would work exactly tho :)
Yes, require
bindings can be applied in "bulk" form, which not only binds N exports at a time, but shares binding-representation information among syntax objects in contexts that require
from the same module. Sharing is just a constant-factor improvement in practice, though, and the resolution of shared information is deeply tied to the module-declaration machinery. So, it's probably better to avoid it for your purposes.
Okay!
I've added syntax-serialize
and syntax-deserialize
.
The #:provides-namespace
argument gives you control over the use of bulk bindings. Set it to #f
(or an empty namespace) to make a serialized form independent of bulk bindings. Or, if you decide to track dependencies and take advantage of bulk-binding sharing by loading module declarations into a namespace, you have relatively fine-grained control through #:provides-namespace
.
The #:preserve-property-keys
argument lets you specify extra property keys to treat as preserved.
I don't think you'll need #:base-module-path-index
.
Maybe the proper terminology here is "de-marshaling" (from byte code)?
Anyway this is what I'm seeing (possibly doing something wrong):
experiment.rkt
:This prints 2 values plus an error: