Open sorawee opened 4 years ago
Another idea that I want to throw out here.
I tried the following snippet in pollen-tfl
after making Pollen reuse namespace:
#lang racket
(require (for-syntax racket/string
racket/format))
(define-syntax (gen-all stx)
(define reqs
(for/list ([f (directory-list)] #:when (string-suffix? (~a f) ".pm"))
(with-syntax ([f (~a f)]
[x (gensym)])
#'(begin (module x racket
(require f)
(println doc))
(require 'x)))))
#`(begin #,@reqs))
(gen-all)
It takes 65s to run, without caching. This suggests that if we have a better dependency manager, then we can require a bunch of files at once, saving the cost of dynamic-require
s.
(The running time reduces even further to 28s after excluding toc.html.pm
, because the file loads other files which result in dynamic-require
. The dynamic-require
cost would be eliminated with a good dependency manager and caching).
If you can make Pollen go faster, great. Making it go faster while preserving its features is the hard part, I have found.
I think this is a performance boost for free?
This is what I thought when I added parallel rendering. It was less true than I hoped. 🤯
Removing steps from an expensive computation is a great way to save time. But it’s only “free” if you know for sure that skipping those steps never leads to incorrect results. Attaching permanent caveats — “it works, if you know that X Y and Z are true” — leads to despair, which is not free.
IIRC the reason fresh namespaces were necessary was to support dynamic re-evaluation during an interactive project server session. Otherwise, dynamic-require
caches its results, and you have to restart the project server to see changes.
You may be right, however, that certain simplifications are possible during a non-interactive session (say, when using raco pollen render
, because we can safely assume that the source is not changing from start to finish.)
For instance. The reason, say, Scribble can be faster on large documents is that all the component source files are pulled into one master source — this one source is compiled & evaluated — and then multiple pages are emitted as output. Pollen, by contrast, has one source per output file, each separately evaluated.
OTOH Scribble can do this because it exerts more control over how the document is structured. You can import your own functions to a Scribble source. But it doesn’t permit the granularity of control that Pollen does. Costs vs. benefits.
I’ve considered, at least, whether Pollen could similarly “gang” files together and consolidate evaluations. For instance, by packing a number of source files into another as submodules. But I don’t see why this would change anything, aside from repositioning the pieces on the board. A module has the same evaluation costs regardless if it’s a submodule or standalone source file.
As a middle approach, I’ve also considered whether Pollen could introduce a concept of a one-to-many page. This would be faster to evaluate, because it would be a single evaluation (like a Scribble source). But it would be a distinct concept within Pollen from the current preprocessor / Markdown / markup files.
The problem with a one-to-many file type is that it makes dynamic refresh annoying, because now you have to refresh a possibly huge source in order to refresh one small part.
The other issue with one-to-many page generation is that I have never wanted this once for my own work. For me, the value of Pollen is exactly that it is so luxuriously indulgent. Every page triggers a full program evaluation! Where else can you get this? Nowhere.
By contrast, the one-to-many publishing model is well covered by other tools — Scribble, or Frog, or a zillion other static-site generators beyond.
So, though I am always interested in making Pollen faster, it only makes sense if the new technique supports the core theory of operation. Which is why, so far, I have focused more on file-based caching and more recently parallel processing. I’m sure there are other good ideas yet to be discovered.
if files like pollen.rkt have a side-effect (say, mutate a global variable), then the side-effect would persist across rendering multiple files.
What would be a test case that demonstrates this behavior? The fix in #49 doesn’t break any existing Pollen tests, nor any of my own projects. Moreover, Pollen doesn’t guarantee a clean namespace for rendering — like I say, it’s more of a necessity to support dynamic refresh during an interactive session.
My hunch is that the situation doesn’t arise much in the wild, because Racket naturally deters use of global variables and mutation.
Consider:
;; a.html.pm
#lang pollen
;; b.html.pm
#lang pollen
;; pollen.rkt
#lang racket
(provide root)
(define x 0)
(define (root . xs)
(set! x (add1 x))
(number->string x))
Prior the namespace reuse, raco pollen render .
will create the following files:
<html><head><meta charset="UTF-8"/></head><body>1</body></html>
<html><head><meta charset="UTF-8"/></head><body>1</body></html>
After the namespace reuse, raco pollen render .
will create the following files:
<html><head><meta charset="UTF-8"/></head><body>1</body></html>
<html><head><meta charset="UTF-8"/></head><body>2</body></html>
I think I would call this a case of nondeterministic compilation, in which case Pollen’s guarantees needn’t be any stronger than Racket’s. For instance, if we convert these files to Racket modules, we’d get the same weird behavior:
;; a.rkt
#lang racket
(require "base.rkt")
(provide x)
(define x (f))
(println x)
;; b.rkt
#lang racket
(require "base.rkt")
(provide x)
(define x (f))
(println x)
;; base.rkt
#lang racket
(provide f)
(define x 0)
(define (f)
(set! x (add1 x))
(number->string x))
Suppose these all live in collection foo
. Running racket -l foo/a
or racket -l foo/b
will print 1
. But running racket
and then doing (require foo/a)
and (require foo/b)
(or vice versa) will produce 1
then 2
.
I think mutation like this is quite common when one wants to communicate across tags. E.g., making footnotes. There's a way to make it work by dealing with things in the root
function instead, but that's a whole program restructuring. As a concrete example of how these mutation is useful:
;; a.html.pm
#lang pollen
â—Šinc-x[] or â—Šinc-x[]
;; b.html.pm
#lang pollen
â—Šinc-x[] and â—Šinc-x[]
;; pollen.rkt
#lang racket
(provide inc-x)
(define x 0)
(define (inc-x)
(set! x (add1 x))
(number->string x))
And this would work prior namespace reuse, with a.html
having content "1 or 2" and b.html
having content "1 and 2". However, after namespace reuse, it would be "1 or 2" and "3 and 4"
Note that I'm not saying that producing "1 or 2" and "3 and 4" are wrong. It's an acceptable behavior, but there should be a way to make it possible to produce "1 or 2" and "1 and 2".
One easy way is to fix this problem is to create a tag named reset
that (set! x 0)
and put reset
at the beginning of every Pollen file, but there's an alternative approach that I like more.
One feature that I think will be very useful is some sort of #%module-begin
macro for Pollen programs (it actually doesn't need to be a macro, see details below). Right now, the topmost level root
effectively must be a function because of how it's used: (apply root-proc xs)
at https://github.com/mbutterick/pollen/blob/master/pollen/private/main-base.rkt#L34.
However, this means root
will be called as a last function in Pollen program evaluation. Sometimes, though, what I want is an ability to have root
set things up. So my workaround is the following:
;; pollen.rkt
#lang racket
(provide (all-defined-out))
(require racket/splicing)
(define current-x (make-parameter 0))
(define-syntax-rule (my-root xs ...)
(splicing-parameterize ([current-x 0])
xs ...))
(define (inc-x)
(current-x (add1 (current-x)))
(number->string (current-x)))
Then:
;; a.html.pm
#lang pollen
â—Šmy-root{
â—Šinc-x[] or â—Šinc-x[]
}
;; b.html.pm
#lang pollen
â—Šmy-root{
â—Šinc-x[] and â—Šinc-x[]
}
will deterministically produce:
<html><head><meta charset="UTF-8"/></head><body><root>1 or 2</root></body></html>
<html><head><meta charset="UTF-8"/></head><body><root>1 and 2</root></body></html>
But as you can see, I need to wrap everything in my-root
to make this work. It would be nice if Pollen has a special symbol like root
whose dynamic extent covers the entire Pollen program evaluation.
But OK, perhaps macro is too demanding, then another possibility is thunking. That is, my-root
will consume an argument f
which, when invoked, will evaluate Pollen program. To make it consistent with the current behavior, my-root
by default would be:
(define (my-root f) (f))
But users are allowed to override my-root
to something like:
(define (my-root f)
(parameterize ([current-x 0]) (f)))
Note: I edited the above comment a lot. You might want to read it from GitHub instead of email.
One easy way is to fix this problem is to create a tag named reset that (set! x 0) and put reset at the beginning of every Pollen file
Yes — moreover, this is the Rackety way to go about it, and using fresh namespaces would be both perverse and slow.
However, this means root will be called as a last function in Pollen program evaluation. Sometimes, though, what I want is an ability to have root set things up
The idea of a function named, say, init
that can be used for setup tasks at the start of a page render is interesting. But something like #%module-begin
is a little different. Can’t you already do that, by making your own Pollen-derived #lang
?
The idea of a function named, say,
init
that can be used for setup tasks at the start of a page render is interesting. But something like#%module-begin
is a little different. Can’t you already do that, by making your own Pollen-derived#lang
?
init
would suffice for (set! x 0)
solution, but would not suffice for parameterize
solution. When I thought about this, I wanted to find the most general solution that can be used in various settings. That being said, if you think init
would be more suitable, I would welcome it. It's better than nothing.
I’m not averse to something like #%root-begin
— I just try to avoid macro solutions where possible. I’ll think about how it could be done (unless you want to prototype it into a PR)
But OK, perhaps macro is too demanding, then another possibility is thunking. That is,
my-root
will consume an argumentf
which, when invoked, will evaluate Pollen program. To make it consistent with the current behavior,my-root
by default would be:(define (my-root f) (f))
But users are allowed to override
my-root
to something like:(define (my-root f) (parameterize ([current-x 0]) (f)))
Would this be acceptable?
Why not try moving root
to a position where it can be either a function or a macro. That was your first suggestion. That seems more flexible than the thunking idea.
Just chiming in to say I do use mutable hash tables in my pollen.rkt
for footnotes and link references, and when doing parallel renders many of my pages now have footnotes from other pages.
However, I’m not complaining or asking to revert. I am persuaded that the new way has benefits. I just want to understand what the implications are right now for state that I want preserved between tag function calls but not across pages when doing parallel renders. Are parameters no longer sufficient for this purpose?
I understand that refactoring so that dealing with everything inside root
is one way to do this; I could also prefix my hash keys with some unique per-page value (like here-path
) to isolate each page’s values from each other (specifically in the case of hash tables).
Right — you’ll need to manage the state for each page explicitly, rather than relying on that behavior as a side effect of fresh namespaces.
In general, using here-path
to key this data is a good idea, since that's guaranteed to be unique for each source file.
Concatenating the keys would work, though it makes per-page queries a little messy. One could also convert a footnote hash into a hash with subhashes: the top level is indexed by here-path
, and then the subhashes are indexed by footnote number.
#lang racket
(require pollen/core)
(define fn-hash (make-hash))
(define (fn txt)
(define page-path (hash-ref (current-metas) 'here-path))
(define fn-hash-page (hash-ref! fn-hash page-path make-hasheq))
(define fn-count (add1 (length (hash-keys fn-hash-page))))
(hash-set! fn-hash-page fn-count txt)
(format "~a is fn ~a" txt fn-count))
Why not try moving
root
to a position where it can be either a function or a macro. That was your first suggestion. That seems more flexible than the thunking idea.
I think it's the same reason why root
in the current Pollen exists. One hypothetical design of Pollen is to require people to wrap the whole content up in the top-level tag explicitly instead of relying on the implicit root, but that would be very tedious, and that's why I think you choose to use the implicit root
tag instead.
I’ve been testing and working to ensure I fully understand the implications of this change. Tell me if I have this correct:
#lang pollen
program is evaluated (i.e., to produce its doc
), if being run from inside the Pollen web server, it gets its own namespace guaranteed.#lang pollen
program is evaluated in any other context, it shares namespace/state with all other #lang pollen
programs from the same Pollen project being evaluated in the same process.
raco pollen
command, they share a namespace.
And finally:
As to this last bit, consider this MVE. I could not find a sequence of raco pollen
commands that would get the (template)
line of the rendered output to say anything other than Result: 1
.
The fresh namespace is only necessary in the project-server context, because that’s the only way to make sure that all updated source files (incl "pollen.rkt"
) are properly incorporated in a render (originally it was the fix for https://github.com/mbutterick/pollen/issues/64).
Your description of the behavior seems right except for the last point. It would be more accurate to say that after this change, a Pollen source may or may not be evaluated in its own namespace, just as currently, it may or may not be evaluated in parallel. In both cases, the programming should not depend on any side effects of these environments.
That said, one can still avoid parallel processing — possibly useful for projects that want a guaranteed evaluation order. Likewise, I could add a command-line switch or setup
value to restore the fresh-namespace behavior for those who prefer the consistency.
Your code example depends on mutation of a global variable, which is always going to be troublesome.
I did an experiment by modifying Pollen to use the same namespace instead of creating a new one for every file. For
pollen-tfl
with one thread, the rendering time afterraco pollen reset
reduces from 332s to 121s.Of course, the behavior would be different. In particular, if files like
pollen.rkt
have a side-effect (say, mutate a global variable), then the side-effect would persist across rendering multiple files. However, for projects that don't have side-effects (which are probably the majority?), I think this is a performance boost for free?Perhaps there should be an option to allow using the same namespace?