maralorn / nix-output-monitor

Pipe your nix-build output through the nix-output-monitor a.k.a nom to get additional information while building.
GNU Affero General Public License v3.0
835 stars 24 forks source link

refactoring: replace Map with HashMap (#98) #117

Closed blackheaven closed 9 months ago

blackheaven commented 11 months ago

I have tried to have a look at IntMap, but it seems to be dynamic.

If you have any hints.

maralorn commented 11 months ago

Thank you very much! However, my issue was meant more as exploring an idea. So, we will need to look at this carefully in benchmarks to decide if this is the right way to go. I can’t promise you, that I will merge this.

I am sadly super busy right now, so it might take a few weeks until I can evaluate this.

I am not sure what you mean with IntMap being dynamic. The interesting thing about the IntMaps happening in nom is that they are densely packed. Which means that they could be represented very well by a vector. I have no clue however if the worse insert performance on that will compensate the better lookup performance. Anyway, the changes are orthogonal and we can evaluate them separately.

blackheaven commented 11 months ago

Don't worry, how can I run the benchmark? Is it documented somewhere?

By "dynamic", I mean, we don't have the size at the initialization (it changes as we iterate).

maralorn commented 11 months ago

Ah, yeah it is dynamic in that sense. Although maybe we can somehow collect the entries for a while and then insert them in bunches, but that will be a larger refactoring.

maralorn commented 11 months ago

The benchmark is in the ./bench folder but running it and trying different parameters is a rather manual and undocumented process.

maralorn commented 9 months ago

I took the liberty of rebasing onto current main.

Actually running the benchmark or the profile is not as complicated as I remember. Just run bench/profile.sh before and after the change and compare the resulting eventlogs. Or run bench/bench.sh before and after the change and compare the run times and memory footprints.

Only difficult part is that for that a few store paths need to exist, which I took care of with a recent change to the benchmark scripts.

Anyhow, this change sadly does not show any performance improvements. It might even have gotten a little bit worse in runtime, but the noise is too large to observe anything, sorry.

blackheaven commented 9 months ago

No worries, that's good to know, at least, it objectively confirm that this change alone is not enough to create a clear speedup.

Does the memory consumption changed?

maralorn commented 9 months ago

The memory profile is basically the same except that some constructors are switched, which is to be expected.

The hope for this change was to profit from O(1) lookups. But the lookups in the specific Maps you changed actually don’t happen that often, because I tried to optimize string lookups away as much as possible. The workhorse is really the CachedIdMap (a slightly type-safe IntMap), that is used in most of the calculations.

I wonder if just using the HashMap from the start and not using any CacheIds internally would actually be slower. It would at least be significantly simpler and remove a hard-to-enforce invariant from the code. But since the data structures contain quite a few cross-references it could mean a significant memory increase.

blackheaven commented 9 months ago

Makes sense, thanks for your time and feedback.