rfindler / fully-expanded-store

1 stars 0 forks source link

Some things apparently don't survive serialization ("de-marshaling"?) #1

Open greghendershott opened 3 years ago

greghendershott commented 3 years ago

Maybe the proper terminology here is "de-marshaling" (from byte code)?

Anyway this is what I'm seeing (possibly doing something wrong):

experiment.rkt:

#lang racket/base

(require racket/path
         racket/runtime-path
         syntax/modread
         drracket/check-syntax)

;; Let's fully expand a source file in a fresh namespace, and use the
;; current-compile / quote-syntax approach to serialize that to `ob`.
(define ob (open-output-bytes))
(define-runtime-path src-path "experiment.rkt") ;e.g. this file
(define dir (path-only src-path))
(parameterize ([current-load-relative-directory dir]
               [current-directory               dir])
  (with-module-reading-parameterization
    (λ ()
      (with-input-from-file src-path
        (λ ()
          (port-count-lines! (current-input-port))
          (define stx (read-syntax))
          (parameterize ([current-namespace (make-base-namespace)])
            (define exp-stx (expand stx))
            (println (syntax-property exp-stx 'module-body-context)) ;; print this to compare below
            (define compiled ((current-compile) `(,#'quote-syntax ,exp-stx) #f))
            (write compiled ob)))))))

;; Now, in a fresh namespace -- not the one in which we originally
;; expanded -- let's read and eval the bytes, and see if some things
;; were preserved.
(parameterize ([current-namespace (make-base-namespace)])
  ;; This indeed works, as far as loading a syntax object.
  (define exp-stx (eval (parameterize ([read-accept-compiled #t])
                          (read (open-input-bytes (get-output-bytes ob))))))
  ;; [Problem 1]: Are all of the syntax properities preserved? Alas, no, this
  ;; returns #f:
  (println (syntax-property exp-stx 'module-body-context))
  ;; [Problem 2]: Can we use the loaded expanded syntax to, for
  ;; example, give to drracket's show-content? Alas, no, this errors:
  ;;
  ;; namespace mismatch: bulk bindings not found in registry for module: #<resolved-module-path:"/home/greg/racket/8.0.0.12/share/pkgs/drracket-tool-lib/drracket/check-syntax.rkt">
  (show-content exp-stx
                #:fully-expanded? #t
                #:namespace       (current-namespace)))

This prints 2 values plus an error:

#<syntax:/home/greg/src/racket/pdb/experiment.rkt:1:6 racket/base>
#f
; namespace mismatch: bulk bindings not found in registry for module: #<resolved-module-path:"/home/greg/racket/8.0.0.12/share/pkgs/drracket-tool-lib/drracket/check-syntax.rkt">
; Context (plain; to see better errortrace context, re-run with C-u prefix):
;   /home/greg/racket/8.0.0.12/share/pkgs/drracket-tool-lib/drracket/private/syncheck/traversals.rkt:184:2 level+tail+mod-loop
;   /home/greg/racket/8.0.0.12/share/pkgs/drracket-tool-lib/drracket/private/syncheck/traversals.rkt:184:2 level+tail+mod-loop
;   /home/greg/racket/8.0.0.12/share/pkgs/drracket-tool-lib/drracket/private/syncheck/traversals.rkt:48:10 expanded-expression
;   /home/greg/racket/8.0.0.12/share/pkgs/drracket-tool-lib/drracket/check-syntax.rkt:56:0 show-content
;   /home/greg/racket/8.0.0.12/collects/racket/contract/private/arrow-val-first.rkt:555:3
greghendershott commented 3 years ago

In my experience, to do anything useful with fully-expanded code you need the namespace in which it was originally expanded. Which means the only "cache" that works is in-memory -- the namespace and the expand syntax, both. But that chews through memory very quickly. (e.g. https://github.com/greghendershott/racket-mode/issues/512)

As a result I've shifted to thinking about ways to run all the analyses up-front, eagerly, and save those interesting result in an on-disk database. Which is what I started to experiment with in https://github.com/greghendershott/pdb (savings definitions and references).

mflatt commented 3 years ago

It would be possible to serialize syntax objects in a way that preserves bulk bindings — at the expense of not sharing the exporting module's information when the syntax object are deserialized, but non-sharing may be what you want here.

I imagine that non-preserved syntax properties like 'origin are also an issue. That seems a little tricker, since some non-perserved syntax properties are probably non-serializable. If the serialization function took a set of non-perserved keys to treat as preserved, would that work?

greghendershott commented 3 years ago

It would be possible to serialize syntax objects in a way that preserves bulk bindings — at the expense of not sharing the exporting module's information when the syntax object are deserialized, but non-sharing may be what you want here.

That sounds good.

I don't understand what "the exporting module's information" means, so I don't know whether to think that's good or bad.

I imagine that non-preserved syntax properties like 'origin are also an issue. That seems a little tricker, since some non-perserved syntax properties are probably non-serializable.

A quick glance at traversals.rkt shows it uses a half dozen or so syntax properties. I don't know how many of those are serializable.

If the serialization function took a set of non-perserved keys to treat as preserved, would that work?

I think so?

I'm not sure how to handle ones that turn out to be non-serializable. Maybe there needs to be required-keys which if non-serializable raises an exception, and optional-keys where it just skips --- or something like that?

I'm just guessing there might be uses where it's acceptable to proceed with some missing. (I'm not sure if that applies to drracket/check-syntax; @rfindler knows better if an "incomplete" analysis is better than nothing -- or worse than nothing. But I think Robby's idea was to build something that could also support other uses.)

greghendershott commented 3 years ago

I'm not sure how to handle ones that turn out to be non-serializable. Maybe there needs to be required-keys which if non-serializable raises an exception, and optional-keys where it just skips --- or something like that?

Not to over-think this, but I can imagine values that aren't serializable -- but a function could transform them into a value that is. The substitute value might be "impoverished", but it might be better than nothing, and enough to support some use case.

So maybe the "ideal" would be something in the spirit of the not-found argument to hash-ref. But in this case, not-serializable.

Like I said, maybe over-thinking it.

rfindler commented 3 years ago

It surely seems like this library could publish which properties it serializes and we could add more over time, as they became needed/useful. That's at least a minimum choice that sounds workable. There may be better choices tho.

I think all of the syntax properties that check syntax currently uses are serializable.

greghendershott commented 3 years ago

@rfindler Some of the syntax property values are identifiers. A piece of syntax that is identifier? can be serialized. But do we know "how much" of an identifier (in a syntax property value) is serialized by compile, and is recovered by the (eval (read __)) deserialization?

I'm wondering about information about an identifier beyond its symbol datum and srcloc --- things like scope, and operations like comparing identifiers for equality or giving them to identifier-binding.

(I'm not claiming it won't or can't work. I have a fuzzy understanding of what's involved. I'm genuinely asking, to double-check.)

mflatt commented 3 years ago

Serialization currently preserves all of that, except for "bulk bindings", which are included only by reference to a providing module. So, a key piece is keeping bulk bindings with the serialized object instead of just a reference to the module.

rfindler commented 3 years ago

What's an idea of what's inside a "bulk binding"? Is it like when I call identifier-binding on an identifier, the answer might be in a "bulk binding" if it is an imported identifier and so serialization (without us doing something special) would lose that?

If so, given what the rest of the code in this repo is doing, it may make sense for us to maintain a similar kind of reference (since anytime we have a fully expanded thing of some module we also have all its imports too). Not sure how this would work exactly tho :)

mflatt commented 3 years ago

Yes, require bindings can be applied in "bulk" form, which not only binds N exports at a time, but shares binding-representation information among syntax objects in contexts that require from the same module. Sharing is just a constant-factor improvement in practice, though, and the resolution of shared information is deeply tied to the module-declaration machinery. So, it's probably better to avoid it for your purposes.

rfindler commented 3 years ago

Okay!

mflatt commented 3 years ago

I've added syntax-serialize and syntax-deserialize.

The #:provides-namespace argument gives you control over the use of bulk bindings. Set it to #f (or an empty namespace) to make a serialized form independent of bulk bindings. Or, if you decide to track dependencies and take advantage of bulk-binding sharing by loading module declarations into a namespace, you have relatively fine-grained control through #:provides-namespace.

The #:preserve-property-keys argument lets you specify extra property keys to treat as preserved.

I don't think you'll need #:base-module-path-index.