theseion / Fuel

Fuel, the Smalltalk object serializer
https://theseion.github.io/Fuel
MIT License
26 stars 12 forks source link

Substitute test checks for set inclusion on O(n), could be O(1) #252

Closed maenu closed 2 years ago

maenu commented 2 years ago

I wanted to serialize non-trivial object graphs when I noticed a long run time of the serialization process. Profiling it revealed that about half of the time was spent in FLSubstitutionCluster >> #isSubstitute:. This method uses #identityIncludes: on an IdentitySet. For some reason unbeknownst to me, this method performs a linear scan over the identity set (in Pharo 9), whereas #includes: would be of constant cost.

I tried to reproduce the performance benefit of using #includes: instead, but the benchmark tests of Fuel gave only some percents of improvement. I guess my object graph had a very different topology or other properties, but I cannot easily add it as a Fuel benchmark, as it uses a lot of third-party objects.

Now I am unsure on how to proceed. Would it be possible to replace #identityIncludes: with #includes: in Fuel? Or should Pharo be fixed to implement #identityIncludes: in a sane manner? Why is it #identityIncludes: anyway, due to other Smalltalks?

In any case, there seems to be room for more benchmarks, as they did not cover my issue. I am just not sure how the get those models from within an image, maybe ASTs, explicitly created recursive graph models, ...?

theseion commented 2 years ago

Hey, thanks for the report. I didn't see it until today, sorry.

I'd love to have more benchmarks. If we can figure out what makes you graph special we should be able to build it ourselves. Maybe we'll know more, once I've looked at this issue.

theseion commented 2 years ago

Well, a very cursory look just leads to a 🤦 . #identityIncludes: is implemented on HashedCollection, which is very abstract and, hence, uses a loop. The solution should really be to implement #identityIncludes: on IdentitySet but for our purposes it's enough to use #includes: instead.

theseion commented 2 years ago

I've fixed this for the upcoming new major version. If you like, I'd be very happy for you to give it a try. Is this something you need backported?

tinchodias commented 2 years ago

Great. Probably my mistake. But I makes perfect sense that the includes: of a IdentitySet is based on == and not in =.

maenu commented 2 years ago

@theseion I think I can live without a backport. I will test it and try to provide a test once I can reduce my example. I will be touching Fuel in the coming days anyways, as I noticed some errors with materialized FullBlockClosures. @tinchodias I agree, it makes sense that includes: uses object identity in an IdentitySet. But it does not make sense that identityIncludes: is slower.

I think too this should be addressed by either implementing that method in hashed collections, or just drop the method, in Pharo. But as there was an easy workaround, I just dropped it here first.

theseion commented 2 years ago

The new release, which I hope to complete this weekend, includes support for FullBlockClosure and also anonymous classes (and a couple of other things). If your setup isn't too heavily customized, I suggest you try the new version.

I will open a PR with a fix for IdentitySet>>identityIncludes:.

theseion commented 2 years ago

Added issue with PR: #10292.