Closed homonoidian closed 2 years ago
tests
with valgrind
In general, the slowest parts are Crystal's GC (which is expected), Block#add
(RealSubstrate#insert?
-> allocation) Block#at?(form)
(the topic of this issue), Block#instance
(Array#dup
-> allocation).
Another idea on this:
Maybe we can make use of some sort of superglobal table like {Block, Form} => Entry
which we'd use to lookup stuff in any block. This'd require hashing blocks differently tho, based on their object id or something like that.
For this though, we'd also need to have a cache of the block parent tree or at least some sort of way to figure out which blocks we can look up stuff in (to emulate the linked list of blocks we have right now), and in which order. Rebuilding the parent tree on reparent/initialize into another superglobal hash table Block => Array(Block)
where the array has parents of key Block from nearest to farthest is also useful. We also won't need to have property parent
in block then, because all it's used for is at?
and lookup (maybe we can emulate it). parent
can just inspect the hash table. parent=
do Hash#[]=
, etc.
This should speed up the language immensely. The hash table should never be GCd too (if it is possible to somehow disable GC on a specific object; o/w we'd also need to use our own table data structure), because it's lifetime = always lifetime of the program. It'd be logical of Crystal to disable GC on things stored in constants because they'd never get out of scope, but I may not be aware of something.
Blocks also won't need to store table
. This has its disadvantages because you can include ITable
and have custom lookup mechanisms. We can mitigate that by delegating lookup to table
, which would be a Value that holds a pointer to the block it's part of. It then can look up in the flat superglobal table however it wishes.
Unfortunately it doesn't seem that caching will help. There are just too many blocks created, and even without caching they're putting a lot of strain on the Crystal's GC.
So, adding cache slows down Novika (or I'm dumb and didn't implement the caching right). That feels a bit unnatural, but really, I suppose it's because lookup hierarchies are fairly short other than for recursive calls (and that can be optimized by just comparing prototypes and instructing instance
to reparent to prototype instead of self
; maybe that's why as?
takes so much time).
There's a lot of "maybe"s though. I believe it's going to be better if I write some profiling tools for Novika, and use them to optimize the Novika code (which is pretty badly written btw). I do not have any ideas on how to optimize the Crystal code, other than rewriting it in C. But I don't want to rewrite it in C, and with very high probability it won't be too much faster. The set of ideas I chose for core Novika isn't super fast even in isolation.
Implementing a read-only eject
should help, too, although currently it's not as widely used to impact performance.
I'm closing this for now, because again, I don't want to rewrite the code in C and I have no idea on how to optimize it right now. Maybe I shouldn't. Premature optimization ... You know how it goes.
I don't know how much I should trust
flamegraph
or whatever the tool is called, but this sounds like a hugeat?
problem to me. This particular flamegraph is ofnovika core tests
.This one is of playground which
map
s andtimes
a lot as far as I remember. Almost 20% of the samples are inat?
.This one is of snake. Almost 1 GB of perf data, and the sucker is
at?
again, almost 21% of the samples.Now,
Entry
classes exist on the heap and are not replaced with anything, if I remember correctly. They are the actual things that hold the value forms, not hashes. Hashes are used for an O(1) way to look up anEntry
. Soat?
is optimizable by caching, I believe. Of course there are some edge cases, but I think they are solvable and the performance benefit would be great. Benchmarking... leads to these results on Ryzen 3 2200G:
Note that the repeated execution of
repeat
blocks is expected. We're measuring the effect that caching has.