Closed Araq closed 3 years ago
The implementation of the compilation cache is tricky: There are lots of issues to be solved for the front- and backend.
We store a module's AST of a successful semantic check in a SQLite database. There are plenty of features that require a sub sequence to be re-applied, for example:
{.compile: "foo.c".} # even if the module is loaded from the DB,
# "foo.c" needs to be compiled/linked.
The solution is to re-play the module's top level statements. This solves the problem without having to special case the logic that fills the internal seqs which are affected by the pragmas.
In fact, this decribes how the AST should be stored in the database,
as a "shallow" tree. Let's assume we compile module m
with the
following contents:
import strutils
var x*: int = 90
{.compile: "foo.c".}
proc p = echo "p"
proc q = echo "q"
static:
echo "static"
Conceptually this is the AST we store for the module:
import strutils
var x*
{.compile: "foo.c".}
proc p
proc q
static:
echo "static"
The symbol's body
is loaded lazily, on demand. This is where most
savings come from, only the shallow outer AST is reconstructed immediately.
It is also important that the replay involves the import
statement so
that the dependencies are resolved properly.
Nim allows .global, compiletime
variables that can be filled by macro
invokations across different modules. This feature breaks modularity in a
severe way. Plenty of different solutions have been proposed:
Set[T]
or
similar unordered, only-growable collections so that we can track
the module's write effects to these variables and reapply the changes
in a different order.(These solutions are not mutually exclusive.)
Since we adopt the "replay the top level statements" idea, the natural
solution to this problem is to emit pseudo top level statements that
reflect the mutations done to the global variable. However, this is
MUCH harder than it sounds, for example squeaknim
uses this
snippet:
apicall.add(") module: '" & dllName & "'>\C" &
"\t^self externalCallFailed\C!\C\C")
stCode.add(st & "\C\t\"Generated by NimSqueak\"\C\t" & apicall)
We can "replay" stCode.add
only if the values of st
and apicall
are known. And even then a hash table's add
with its
hashing mechanism is too hard to replay.
In practice, things are worse still, consider someGlobal[i][j].add arg
.
We only know the root is someGlobal
but the concrete path to the data
is unknown as is the value that is added. We could compute a "diff" between
the global states and use that to compute a symbol patchset, but this is
quite some work, expensive to do at runtime (it would need to run after
every module has been compiled) and also would break for hash tables.
We need an API that hides the complex aliasing problems by not relying on Nim's global variables. The obvious solution is to use string keys instead of global variables:
proc cachePut*(key: string; value: string)
proc cacheGet*(key: string): string
However, the values being strings or json is quite problematic: Many lookup tables that are built at compiletime embed proc vars and types which have no obvious string representation... Seems like AST diffing is still the best idea as it will not require to use an alien API and works with some existing Nimble packages, at least.
On the other hand, in Nim's future I would like to replace the VM
by native code. A diff algorithm wouldn't work for that.
Instead the native code would work with an API like put
, get
:
proc cachePut*(key: string; value: NimNode)
proc cacheGet*(key: string): NimNode
The API should embrace the AST diffing notion: See the
module macrocache
for the final details.
In the following sections global means shared between modules or property of the whole program.
Nim contains language features that are global. The best example for that are multi methods: Introducing a new method with the same name and some compatible object parameter means that the method's dispatcher needs to take the new method into account. So the dispatching logic is only completely known after the whole program has been translated!
Other features that are implicitly triggered cause problems for modularity too. Type converters fall into this category:
# module A
converter toBool(x: int): bool =
result = x != 0
# module B
import A
if 1:
echo "ugly, but should work"
If in the above example module B
is re-compiled, but A
is not then
B
needs to be aware of toBool
even though toBool
is not referenced
in B
explicitly.
Both the multi method and the type converter problems are solved by the AST replay implementation.
We cache generic instantiations and need to ensure this caching works
well with the incremental compilation feature. Since the cache is
attached to the PSym
datastructure, it should work without any
special logic.
dlsym
is global.However the biggest problem is that dead code elimination breaks modularity!
To see why, consider this scenario: The module G
(for example the huge
Gtk2 module...) is compiled with dead code elimination turned on. So none
of G
's procs is generated at all.
Then module B
is compiled that requires G.P1
. Ok, no problem,
G.P1
is loaded from the symbol file and G.c
now contains G.P1
.
Then module A
(that depends on B
and G
) is compiled and B
and G
are left unchanged. A
requires G.P2
.
So now G.c
MUST contain both P1
and P2
, but we haven't even
loaded P1
from the symbol file, nor do we want to because we then quickly
would restore large parts of the whole program.
The backend must have some logic so that if the currently processed module
is from the compilation cache, the body
is not accessed. Instead
the generated C(++) for the symbol's body needs to be cached too and
inserted back into the produced C file. This approach seems to deal with
all the outlined problems above.
## This module provides an API for macros that need to collect compile
## time information across module boundaries in global variables.
## Starting with version 0.19 of Nim this is not directly supported anymore
## as it breaks incremental compilations.
## Instead the API here needs to be used. See XXX (wikipedia page) for a
## theoretical foundation behind this.
type
CacheSeq* = distinct string
CacheTable* = distinct string
CacheCounter* = distinct string
proc value*(c: CacheCounter): int {.magic: "NccValue".}
proc inc*(c: CacheCounter; by = 1) {.magic: "NccInc".}
proc add*(s: CacheSeq; value: NimNode) {.magic: "NcsAdd".}
proc incl*(s: CacheSeq; value: NimNode) {.magic: "NcsIncl".}
proc len*(s: CacheSeq): int {.magic: "NcsLen".}
proc `[]`*(s: CacheSeq; i: int): NimNode {.magic: "NcsAt".}
iterator items*(s: CacheSeq): NimNode =
for i in 0 ..< len(s): yield s[i]
proc `[]=`*(t: CacheTable; key: string, value: NimNode) {.magic: "NctPut".}
## 'key' has to be unique!
proc len*(t: CacheTable): int {.magic: "NctLen".}
proc `[]`*(t: CacheTable; key: string): NimNode {.magic: "NctGet".}
proc hasNext(t: CacheTable; iter: int): bool {.magic: "NctHasNext".}
proc next(t: CacheTable; iter: int): (string, NimNode, int) {.magic: "NctNext".}
iterator pairs*(t: CacheTable): (string, NimNode) =
var h = 0
while hasNext(t, h):
let (a, b, h2) = next(t, h)
yield (a, b)
h = h2
This is the upcoming API that replaces the .global, compileTime
vars that are currently available for Nim's macro system, but need to be deprecated since they cannot work with incremental compilation.
Please try to imagine and tell me if your current macros could be based on this API. :-) Note again that only global compileTime variables that are used across module boundaries are affected. The vast majority of macro code out there is not affected.
Araq and I have discussed this issue many times in private conversations, so I'll give a bit more background information.
The main problem with global compile-time variables is that they should be deterministic in the presence of arbitrary module recompilations. For this to work reliably, every time you recompile a module you must first undo all of its previous effects on the global variables and then apply the effects produced from the changed code. Undoing and redoing changes to a data structure is not trivial because it may depend on the order of operations (imagine a simple set
type with incl
and excl
being used from multiple modules that can be reordered in arbitrary way).
Luckily, there is a category of data structures which support manipulation from multiple participants in arbitrary order always arriving at the same result - the so called Conflict-free replicated data types. The CRDT structures include counters, sets and other useful primitives that can serve as the basis for implementing everything else (from Set Theory, we know that sets are enough to represent any concept).
Thus, to increase modularity, the proposed solution is to limit the interaction with global compile-time variables to a set operations over CRDT types which will be recorded by the compiler. This could work by blessing certain types for use in compile-time vars or by introducing a low-level API like the one described above allowing you to build abstractions on top of it.
It should be noted that just manipulating the compile-time variables correctly is not enough for implementing any generative programming idea. The user must also be able to generate code that is derived from the final "program-wide" value of the variable. I usually call such generative programming approaches the "Gather and Generate" pattern. Here are some examples:
A library like nimpy or NimBorg may gather all Nim types that are being passed to Python and Lua functions (by adding them to a global compile-time set as a side-effect from instantiating a generic dot operator). It will then produce a registerNimTypes(foreignVM)
proc that will register all the run-time type information required by the foreign VM for working with the Nim values.
Initially registerNimTypes
exists as an empty placeholder proc (it can be marked in the code with the partial
pragma). Once the final value of the global compile-time variable is determined, a "sealing" compile-time proc is executed that will generate the body of registerNimTypes
.
A RCP library may allow you to create RCP end-points spread across several modules. A rpc
macro is used to register them in a global compile-time set. In the sealing proc, the set or RPC procs is sorted by alphabetical order or by time of introduction (this can be achieved with a staticWrite
API that will automatically update a file stored in version history). The protocol is versioned by hashing the signatures of all RPCs and each of them is assigned a numeric ID for maximum efficiency (from the sorter order). The numeric ID is written to a placeholder symbol referenced in the generated code of each RPC call.
These are quite similar to the RPC library. The sealing proc can traverse the hierarchy of types to generate the required run-time dispatching logic, it can gather all HTTP routes and compile a single Ragel state machine for the most efficient routing possible and so on.
This Gather and Generate technique is one of the most common generative programming patterns and I have a lot of experience simulating it with global run-time variables and type erasure tricks in both C++ and Nim with the currently available functionality.
So, from the description above and Araq's current proposal, we need to address the following:
How does the user introduce a new instance of a shared data structures (CacheSeq
, CacheTable
, CacheCounter
, etc)? Is this done by introducing a variable?
Are these types CRDT data structures? How are they implemented?
How is the "sealing" stage exposed to the user?
Modifications to global variables can be introduced from static:
code executed once per generic instantiation. For this to work, the instantiations need to be reference counted (each client module increases the count). Once a side effect is introduced, it should be removed only when the count drops back to zero. Does this play well with the rest of the proposal?
If the lower-level API is based on NimNodes
, can we have higher-level APIs that are strongly typed?
To complete the feature, we also need the staticWrite
magic for creating files that will go into version control. This is important, because it allows you to produce "cached" artifacts that will affect your co-workers as well.
const
key = CacheCounter"package.module.myCounter" # obviously we can hide that better later.
static:
inc(key, 2)
inc(key)
As far as I understand from this thread, we're going to see a huge speed increase for this in the future.
It seems like most of the slowdown is being caused by the top level statements being replayed on each compilation.
Does this mean that by reducing the number or complexity of top-level statements, we can reduce the processing time required for each module?
Example of current implementation method:
## ------------
## b.nim
import some_expensive_library
proc add*(a: int, b:int): int =
return a + b
## ------------
## a.nim
import b
echo add(a + b)
Would any performance improvement be given by for example first compiling b.nim
to a dylib
or so
and then writing a small wrapper which interfaces with the compiled library?
## ------------
## b.nim
import some_expensive_library
proc add*(a: int, b:int): int {.exportc: "add"} =
return a + b
b.nim
is then precompiled into ./lib/b.so
.
## ------------
## b_wrapper.nim
proc add*(a: int): int {.importc, dynlib: "./lib/b.so".}
## ------------
## a.nim
import b_wrapper
echo add(a + b)
Would this perhaps speed up development processing cycles?
@tavurth This describes the "simplest thing that can possibly work" and later on we have more ideas of how to make the compiler even more incremental and avoid the replaying of top level statements. However, I doubt DLLs are a viable solution to this problem as the code/AST must be available so that the compiler can perform the many compiletime evaluation features that Nim offers.
That is my use-case: The NESM library, I'm working on, uses global variable "ctx" which contains closures for generating [de]serialization code for all previously encountered types. The closures are used to subtitute the object name in the generated code like:
proc serialize(source: NimNode): NimNode = ... # the closure
# it generates folowing code
# for source = IdentNode("a")
a.writeData(...)
# for source = DotExpr(IdentNode("a"), IdenNode("b"))
a.b.writeData(...)
# and so on...
It can be rewritten to use only NimNode (with some efforts though), but I need a contains
proc for the CacheTable.
And I want to know if CacheTable can be used as argument to the function located in the other module.
# moduleA.nim
proc useCache*(a: CacheTable) {.compileTime.} =
...
# moduleB.nim
from moduleA import useCache
const x = CacheTable("mycachetable")
...
useCache(x)
PS: it would be nice to have an ability to overwrite the cached value from the CacheTable at the compile time without 'key already exists' error.
I have put a small bounty ($50) on Bountysource on this issue. (https://www.bountysource.com/issues/68171609-incremental-compilation)
3 questions for clarification:
Are macrocache
items supposed to be mutable? Because the current implementation allows this, like so:
import macros, macrocache
static:
let cache = CacheSeq"foo"
cache.add(newTree(nnkPar))
cache[0].add(ident"abc")
Will using compileTime globals instead of macrocache still be allowed at the cost of breaking incremental compilation? The macrocache documentation says it "is not directly supported anymore" and I'm only guessing that means "problems that compiletime globals have that macrocache can fix will not be addressed" rather than "compiletime globals should never be used". I don't support the latter because macrocache is a restriction on the power that macros have and there are some things that it might not be able to do or won't support that you might want to do.
the seq is a sequence with "set" semantics, but some order is exposed as you need to be able to iterate over it.
This means that they have order, however they won't store identical references or something, is that correct? I don't know because the current implementation allows for you to use both "equal" and identical nodes.
=hooks
are production ready.The first milestone is complete: IC works for 4 carefully selected test cases. There will be a follow-up milestone.
Currently in larger Nim projects, the Nim compiler is the bottleneck, not the C/C++ compiler. This is caused by the fact that the Nim compiler always recompiles the full program including all used libraries. This is not required, Nim should use a "compilation cache" so that only the modules that were changed (plus maybe the modules that depended on these modules) are recompiled. The current idea is to base this compilation cache on an SQLite database. This feature is crucial for Nim's scalability to bigger projects.