Closed danieledapo closed 4 years ago
Thank you very much for working on this!
The speed improvements are impressive. Unfortunately, the behavior isn't quite the same and I can not reproduce the same results. This is a bit difficult to measure, as there is randomness involved, but you can try the following: Run pastel distinct -v 6
and take a look at the optimized D_min score. For the old version, it should consistently be around 103.2. With your version, it also reaches 103.2 sometimes, but it also yields results which are much worse.
10 runs of the old version:
103.23
103.23
103.22
103.22
103.23
103.22
103.23
103.23
103.19
103.22
10 runs of the new version:
98.90
103.23
103.22
98.90
103.22
103.22
98.90
94.28
98.31
93.48
What I can also observe from running pastel distinct
with the -v
/--verbose
option is that the D_mean
score is excessively high in your version. It looks like a some scalar::MAX
values accidentally made it into the computation for the mean.
I will add some inline comments and see if I can find any mistakes.
Thanks!
After having optimized that,
cargo flamegraph
showed that a bunch of time was spent inside therearrange_sequence
function. A first optimization was to simply switch to usingsort_by_cached_key
because calculating the key at each comparison is really too expensive. Then, I went ahead and removed the sorting altogether because I think there was no need for it and introduced a cache for the minimum distances.
Using sort_by_cached_key
sounds very reasonable, but I don't understand the second part. What do you mean by "there was no need for it"? I didn't really understand your new implementation when taking a first look, but I can try to take a closer look again.
After having optimized that,
cargo flamegraph
showed that a bunch of time was spent inside therearrange_sequence
function. A first optimization was to simply switch to usingsort_by_cached_key
because calculating the key at each comparison is really too expensive. Then, I went ahead and removed the sorting altogether because I think there was no need for it and introduced a cache for the minimum distances.Using
sort_by_cached_key
sounds very reasonable, but I don't understand the second part. What do you mean by "there was no need for it"? I didn't really understand your new implementation when taking a first look, but I can try to take a closer look again.
I'm sorry it wasn't clear, I could have explained better.
My understanding of the algorithm is that it has to rearrange the sequence of a colors in such a way that the distance between a color and its predecessors is maximized.
My idea was to do something similar to insertion sort where you search for the element to insert at position i by finding the minimum of the elements from i+1 to the end of the input. I said there is no need to sort the "right" part of the sequence because I think you really need to just find the minimum which is O(n) vs O(n log(n)).
Moreover at each step we can be clever and avoid recomputing the minimal distance from each element in the "right" part to each element in the "left" one because we know which element we inserted in the previous run of the loop and so we can update the previous differences with only the new value.
Thank you for the clarification. I somehow missed that the outer loop was still there and got confused as to how that still works :smile:. Yes, that makes sense! Thank you for the clarification.
If I understand it correctly, your changes brings the complexity from O(N² log N) down to O(N²).
Moreover at each step we can be clever and avoid recomputing the minimal distance from each element in the "right" part to each element in the "left" one because we know which element we inserted in the previous run of the loop and so we can update the previous differences with only the new value.
~This part I still don't quite understand. We did that before as well, right? I only called sort_unstable_by_key
on the right
part. In your implementation, wouldn't it be the same if we just used min_by_key
on the right part? All min_distances
in the right part are recomputed anyway.~
I think I get it now :smile:
If I understand it correctly, your changes brings the complexity from O(N² log N) down to O(N²).
This is correct if we ignore how the calculation of the keys was done. Previously for each element in "right" it was iterating over each element in "left" to find the maximum which I believe is O(N²) (but I never understood big O :) ) so the final complexity of the previous algorithms (I think) was O(N² N² log N) but now it's O(N²) because this calculation is cached and then updated at each step of the outer loop.
Ah, right. That explains why it was so slow for larger N :smile:
Thank you!
Thank you for merging so quickly! :rocket: :tada:
In this PR I've tried to speedup the distinct command by optimizing the simulated annealing process and the rearranging of colors.
Non scientific benchmarks show that in fact the proposed changes greatly improve performance. For example, now I'm able to run
pastel distinct 200 | cat
in <1 sec while without these changes it would have taken ~25secs on my machine. However, on small inputs the boost is not as big; in factpastel distinct 20 | cat
now runs in ~0.2secs vs ~0.4secs which is still better, but still a bit slow imho.The simulated annealing was optimized mainly by avoiding to recalculate the distances from scratch every time , but by only recalculating the distances to the color that changed. This is the main contributing factor to the speedup.
After having optimized that,
cargo flamegraph
showed that a bunch of time was spent inside therearrange_sequence
function. A first optimization was to simply switch to usingsort_by_cached_key
because calculating the key at each comparison is really too expensive. Then, I went ahead and removed the sorting altogether because I think there was no need for it and introduced a cache for the minimum distances.At this point I don't see any other low hanging fruits and I'm likely to stop here as I'm ok with the results.
Let me know what you think.
(also, your implementation of simulated annealing was really cool! simulated annealing is one of my favourite algorithms :smile: )