Canonicalized filepaths has a large performance impact on compilation speed

nim-works / nimskull

An in development statically typed systems programming language; with sustainability at its core. We, the community of users, maintain it.

Other

278 stars 39 forks source link

Currently the compiler supports emiting "canonical" paths for modules, instead of say absolute, project file relative, etc, a list of possibilities can be seen in compiler/front/in_options.FilenameOption. Most pressingly, foCanonical results in massive performance problems as each attempt at canonicalzing a path results in a lot of filesystem IO.

The goal is fairly simple:

get the baseline performance of compiling with --filenames=canonical and without
track down the various use cases and classify them by purpose/intention
with the benchmark data and use cases in hand, eliminate/reduce the filesystem access (AKA cache the data)

To measure the impact search the /tests directory for tests that specify the --filenames=canonical option, then run those with and without to observe the difference. The largest impact was observed with ./koch.py temp --stacktrace --stacktracemsgs -d:nimCompilerStacktraceHints --lib:lib -d:debug -d:usenodeids c --filenames=canonical --msgFormat=sexp tests/compilerfeatures/tstructured_parse_fail.nim, as this resulted in setMsgFrame calls which converted TLineInfo to strings triggering the canonicalization for many hot compiler procedures.

If stuck at any point, ask @saem.

assert and doAssert use assertImpl, inside there use {.line: loc.} it changes the node file index through n.info.fileIndex = fileInfoIdx(c.config, AbsoluteFile(x.strVal)) inside fileInfoIdx it use exapandFilename ensure the file exists otherwise raises os error, the filename get by instantiationInfo sometimes just basic file name eg. lterators.nim. so it always try to resolve.

so here are some bad parts:

instantiationInfo do file index to file name convert, fileInfoIdx do filename to file index convert, well, the file index already known.
it doesn't store the resolved file index to cache, doing same input and output multiple times.
it use exception handle exceptional case and returns file index, the performance will depends on current active exceptions flag.

nim-works / nimskull

Canonicalized filepaths has a large performance impact on compilation speed #546