usethesource / vallang

Generic immutable recursive data representation API targeted at source code models and more.
Other
36 stars 12 forks source link

Reduce allocations during WeakWriteLockingHashConsingMap::get #130

Closed DavyLandman closed 2 years ago

DavyLandman commented 2 years ago

The special class io.usethesource.vallang.util.WeakWriteLockingHashConsingMap$LookupWrapper is constructed a lot.

Running java flight recorder it's the top allocation by far when running a rascal compiler. It also shows up in quite some places in the trace.

DavyLandman commented 2 years ago

I've run some jmh benchmarks in ways to avoid this allocation, they are all slower than just allocating A LOT.

I've tried:

All are slower, closing this issue until someone proposes something else.

jurgenvinju commented 2 years ago
/**
    * Special wrapper that is used for lookup, so that we don't have to create a weak-reference when we don't need it.
    * It would have been nicer if we could use the normal equals method of the target object, but that won't be able to work together with the WeakReferenceWrap.
    */

Since in the meantime the IValue.equals() method does not ignore annotations (or anything else) anymore, have the constraints for not using WeakReferenceWrap directly disappeared?

DavyLandman commented 2 years ago

This is about weak reference equals, not about ivalue equals.

I checked caffeine, they are doing something similar.

DavyLandman commented 2 years ago

It would only work if we extend type equals to deal with this unwrapping. Since this class is made to avoid memory leaks in the typefactory

jurgenvinju commented 2 years ago

this is also a reference: https://github.com/cwi-swat/aterms/blob/master/shared-objects/src/shared/SharedObjectFactory.java

jurgenvinju commented 2 years ago

It solves the entire problem in a different way; perhaps good to compare to.

DavyLandman commented 2 years ago

I'm think about extending the base equals for types, and see if maybe that could solve it.

And otherwise, we could most likely replace this class by one of the caffeine caches.

On Fri, Oct 8, 2021, 14:12 Jurgen J. Vinju @.***> wrote:

It solves the entire problem in a different way; perhaps good to compare to.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/usethesource/vallang/issues/130#issuecomment-938592837, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABL3EYSK76ZHEDF52XB77TUF3NZXANCNFSM5FN3G3KA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

jurgenvinju commented 2 years ago

Type.equals is overridden in many different places, so changing its behavior is possible but certainly would have impact (on performance and possibly functionality) beyond what we can currently imagine.

If we can replace this code by someone else's optimized code, then I'm all for. It's a pretty hairy domain, these hash-consing factories on the JVM.

On Fri, Oct 8, 2021 at 2:39 PM Davy Landman @.***> wrote:

I'm think about extending the base equals for types, and see if maybe that could solve it.

And otherwise, we could most likely replace this class by one of the caffeine caches.

On Fri, Oct 8, 2021, 14:12 Jurgen J. Vinju @.***> wrote:

It solves the entire problem in a different way; perhaps good to compare to.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub < https://github.com/usethesource/vallang/issues/130#issuecomment-938592837 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AABL3EYSK76ZHEDF52XB77TUF3NZXANCNFSM5FN3G3KA

. Triage notifications on the go with GitHub Mobile for iOS < https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675

or Android < https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/usethesource/vallang/issues/130#issuecomment-938610631, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAPF5F2VTSPTQSZUHKMGKRDUF3RBLANCNFSM5FN3G3KA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Jurgen Vinju http://jurgen.vinju.org CWI SWAT TU Eindhoven Swat.engineering BV

jurgenvinju commented 2 years ago

BTW: it's not the case that people always use == on Type, even though they should. I could write a simple analysis to detect this :-) and that would simplify the situation since the only caller of Type.equals would be the sharing factory.

On Fri, Oct 8, 2021 at 2:47 PM Jurgen Vinju @.***> wrote:

Type.equals is overridden in many different places, so changing its behavior is possible but certainly would have impact (on performance and possibly functionality) beyond what we can currently imagine.

If we can replace this code by someone else's optimized code, then I'm all for. It's a pretty hairy domain, these hash-consing factories on the JVM.

On Fri, Oct 8, 2021 at 2:39 PM Davy Landman @.***> wrote:

I'm think about extending the base equals for types, and see if maybe that could solve it.

And otherwise, we could most likely replace this class by one of the caffeine caches.

On Fri, Oct 8, 2021, 14:12 Jurgen J. Vinju @.***> wrote:

It solves the entire problem in a different way; perhaps good to compare to.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub < https://github.com/usethesource/vallang/issues/130#issuecomment-938592837 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AABL3EYSK76ZHEDF52XB77TUF3NZXANCNFSM5FN3G3KA

. Triage notifications on the go with GitHub Mobile for iOS < https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675

or Android < https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/usethesource/vallang/issues/130#issuecomment-938610631, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAPF5F2VTSPTQSZUHKMGKRDUF3RBLANCNFSM5FN3G3KA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Jurgen Vinju http://jurgen.vinju.org CWI SWAT TU Eindhoven Swat.engineering BV

-- Jurgen Vinju http://jurgen.vinju.org CWI SWAT TU Eindhoven Swat.engineering BV

DavyLandman commented 2 years ago

It's the equals of the map that this is a concern about. So yes, we could write our own map.. but I rather not.

On Fri, Oct 8, 2021, 14:49 Jurgen J. Vinju @.***> wrote:

BTW: it's not the case that people always use == on Type, even though they should. I could write a simple analysis to detect this :-) and that would simplify the situation since the only caller of Type.equals would be the sharing factory.

On Fri, Oct 8, 2021 at 2:47 PM Jurgen Vinju @.***> wrote:

Type.equals is overridden in many different places, so changing its behavior is possible but certainly would have impact (on performance and possibly functionality) beyond what we can currently imagine.

If we can replace this code by someone else's optimized code, then I'm all for. It's a pretty hairy domain, these hash-consing factories on the JVM.

On Fri, Oct 8, 2021 at 2:39 PM Davy Landman @.***> wrote:

I'm think about extending the base equals for types, and see if maybe that could solve it.

And otherwise, we could most likely replace this class by one of the caffeine caches.

On Fri, Oct 8, 2021, 14:12 Jurgen J. Vinju @.***> wrote:

It solves the entire problem in a different way; perhaps good to compare to.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <

https://github.com/usethesource/vallang/issues/130#issuecomment-938592837

, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AABL3EYSK76ZHEDF52XB77TUF3NZXANCNFSM5FN3G3KA

. Triage notifications on the go with GitHub Mobile for iOS <

https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675

or Android <

https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub < https://github.com/usethesource/vallang/issues/130#issuecomment-938610631 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAPF5F2VTSPTQSZUHKMGKRDUF3RBLANCNFSM5FN3G3KA

. Triage notifications on the go with GitHub Mobile for iOS < https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675

or Android < https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .

-- Jurgen Vinju http://jurgen.vinju.org CWI SWAT TU Eindhoven Swat.engineering BV

-- Jurgen Vinju http://jurgen.vinju.org CWI SWAT TU Eindhoven Swat.engineering BV

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/usethesource/vallang/issues/130#issuecomment-938616814, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABL3E5FZWV662XFN3X24U3UF3SG5ANCNFSM5FN3G3KA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

jurgenvinju commented 2 years ago

Let's not then. We're already depending on caffeine so that's alright. Which configuration of caffeine caches are we thinking about?

DavyLandman commented 2 years ago

Well the thing is, if you make a custom map implementation, you can inline the unwrapping of the weak reference, and call equals between the right unwrapped object.

But this map might be replaced by a caffeine LoadingCache with weak reference keys, but I would have to check if it still only does referential equality. And if it guarantees atomic loader action.

DavyLandman commented 2 years ago

Actually, a WeakHashMap might be just right, with some locking around it. I'll try something on Monday.

DavyLandman commented 2 years ago

Thanks for discussion @jurgenvinju, after reading how the jdk WeakHashMap works internally, it was exactly what we were searching for. So switched to that, when I implemented this class 2y back, I made some quite extensive tests, so I feel confident that it's working as intended.

I'm very curious to see a benchmark of this, I might make a backport for the pre-java-11 release so we can see the impact of it without the jump to java 11 fuzzying the picture.

DavyLandman commented 2 years ago

I fear I'll have to end up implementing a small map. Since the WeakHashMap get might end up changing the map, as it's clearing out old entries.

ben-manes commented 2 years ago

At one point I was able to coerce escape analysis to avoid the runtime allocation. It is finicky and can't be relied upon, so I don't know if that still helps.

Avoiding allocations via a ThreadLocal was an approach that I considered for Caffeine, after suggesting it to https://github.com/elastic/apm-agent-java/issues/357 to resolve their problems. It won't be a good idea with Loom (virtual threads) and can be a concern in environments that play games with classloaders.

If you want a custom map then Guava's cache (originally MapMaker), Spring's ConcurrentReferenceHashMap, or Hibernate's ConcurrentReferenceHashMap may be worth considering. All were written in Java 5 when the caching libraries had poor concurrency. They are optimized for reference-based caching, though I helped refocus Guava's towards size-based as best that I could. All three are mature but not maintained, because their sin was an emphasis on soft references as the ideal caching strategy. While this looked great in a microbenchmark, in reality it caused stop-the-world death spirals and many production fires (including Adwords).

Maybe instead a discussion on what this class is used for might inspire an alternative approach that avoids your allocation pains?

ben-manes commented 2 years ago

P.S. WeakHashMap is not thread-safe. Unfortunately it is not good enough to perform writes inside a lock while reading outside of it, as your commit does. This can cause visibility problems that might lead to an infinite loop. WeakHashMap is similar to Java's older version, and that race is described nicely in this article.

DavyLandman commented 2 years ago

hi @ben-manes thanks for checking in. I just committed daf00de5, that at least should protect for the cleanup during the get lock.

But yeah, WeakHashMap sure seems like an outdated map. HashMap & ConcurrentHashMap have a lot strategies to switch to trees in case of many collisions.

To help with some context. Our runtime typesystem dependens on hashconsing/maximalsharing for the type values. So in the TypeFactory we want to always return the same exact instance for the same type. A ConcurrentHashMap would already be good enough, except that it can cause memory leaks, since in an IDE setup, you might be constantly changing your types, and after a few days/weeks of an IDE open, our type store is taking up a few GB that nobody else has a reference too anymore. So we want to store the keys (and values) in a WeakReference.

Is that the discussion you were suggesting?

DavyLandman commented 2 years ago

P.S. WeakHashMap is not thread-safe. Unfortunately it is not good enough to perform writes inside a lock while reading outside of it, as your commit does. This can cause visibility problems that might lead to an infinite loop. WeakHashMap is similar to Java's older version, and that race is described nicely in this article.

Damn, I was thinking about that, should have checked that more carefully. Okay, I'll have to revert this, and think of a new scheme on a branch.

Interestingly, our old approach with the HashMap never triggered this infinite loop. But it makes sense.

ben-manes commented 2 years ago

It looks like your HashConsingMap is more commonly referred to as an Interner (ala String.intern(s)). Guava's Interners.weak() delegates to MapMaker.weakKeys() with a dummy value (your WeakReference value is not needed).

You depend on the object identity rather than equality throughout the runtime? That does make it harder if you cannot safely discard and recreate an equivalent object, and have the system work correctly. If you cannot use an off-the-shelf class like Guava's, then a quick fallback might be a two-level cache. A standard size-based to capture recency and frequency, and evict/load through a victim cache to maintain identity. That might absorb enough allocations of the lookup wrappers that the remainder fall within tolerance of the young gen collector.

DavyLandman commented 2 years ago

True, it's like interning.

I don't think we want to import guava for this class.

But your suggestions about the two level cache sounds good. We use the old one with weak references for the 'older/less used' generation, and use regular keys for the hot map. Thanks for the idea, I think that might be a good trade-off.

DavyLandman commented 2 years ago

I've developed a first version of the 2 maps approach. I like the fact that for the types used in the past period, we avoid using a WeakReference and the lookup, but for new values and old values we will be going through 2 or three maps.

I'm currently not a fan of how I handled the promotion/demotion logic, as that ends up looping through the collection once every second.

ben-manes commented 2 years ago

Typically instead of a periodic poll one would use ReferenceQueue#remove() to block the thread on, e.g. see java.lang.ref.Cleaner. To avoid this thread, Caffeine and Guava check periodically as part of their maintenance routines which are guarded by a tryLock, so like WeakHashMap it is cleaned up in response to activity on the map.

I don't think you need to worry about demotion and can use an inclusive second-level cache. If we can assume that the reads from the weak cache are rarer due to the strong cache then the synchronization cost is negligible if simplified to a WeakHashMap. What do you think about something like below?

private final Cache<E, E> strong = Caffeine.newBuilder().maximumSize(10_000).build();
private final Map<E, WeakReference<E>> weak = Collections.synchronizedMap(new WeakHashMap<>());

public E get(E e) {
  return strong.get(e, obj -> {
    var interned = weak.get(obj);
    E value = (interned == null) ? null : interned.get();
    if (value == null) {
      weak.put(obj, new WeakReference<>(obj));
      value = obj;
    }
    return value;
  });
}
DavyLandman commented 2 years ago

Typically instead of a periodic poll one would use ReferenceQueue#remove() to block the thread on, e.g. see java.lang.ref.Cleaner. To avoid this thread, Caffeine and Guava check periodically as part of their maintenance routines which are guarded by a tryLock, so like WeakHashMap it is cleaned up in response to activity on the map.

True, I didn't want to puth the cleanup in the get logic, and also didn't want to create a thread per instance. But I'll reconsider moving it to the get logic.

I don't think you need to worry about demotion and can use an inclusive second-level cache. If we can assume that the reads from the weak cache are rarer due to the strong cache then the synchronization cost is negligible if simplified to a WeakHashMap. What do you think about something like below?

private final Cache<E, E> strong = Caffeine.newBuilder().maximumSize(10_000).build();
private final Map<E, WeakReference<E>> weak = Collections.synchronizedMap(new WeakHashMap<>());

public E get(E e) {
  return strong.get(e, obj -> {
    var interned = weak.get(obj);
    E value = (interned == null) ? null : interned.get();
    if (value == null) {
      weak.put(obj, new WeakReference<>(obj));
      value = obj;
    }
    return value;
  });
}

Interesting idea, so it's always in the weak cache, and gets promoted to the strong cache on a get? interesting.

jurgenvinju commented 2 years ago

Great discussion! Thanks for this.

Some thoughts:

DavyLandman commented 2 years ago

Agreed, we are trying to find a way that has a minimal impact on the performance. Only a possible increased memory pressure in the compact scheme that @ben-manes proposed.

For every version of the Rascal program as it is compiled/interpreted types of the previous version will become dead but they are already in the strong cache for the previous reason;

The goal is that they should after a while be demoted to the weak cache. It's also why I don't want to put a fixed size limit on the cache (although for most cases that is the preferred way to do a cache). We cannot safely estimate a right size limit, and don't want to constantly be demoting&promoting a set of types, but we also do not want to keep around types longer then needed.

I agree that in a compiled run, they are rarely restored after initial class loading. But currently we are in a interpreter world still, and the TypeFactory is hit a lot during execution. So some of the concerns will go away in time, but will remain active for a interactive REPL or IDE.

jurgenvinju commented 2 years ago

I agree that in a compiled run, they are rarely restored after initial class loading. But currently we are in a interpreter world still, and the TypeFactory is hit a lot during execution. So some of the concerns will go away in time, but will remain active for a interactive REPL or IDE. (@DavyLandman)

That's not what I meant. In a compiled run the types never die. By the way even now in the compiled code there are still many dynamically created types (for example for instantiating polymorphic functions and data-types). However, it is with a closed world of programs highly unlikely that enough of those dynamically created types would die during a program run to motivate a garbage collect. We could know all of the types in advance probably using some kind of abstract interpretation but that is not worth the effort.

However, between versions of these compiled programs in the IDE, we have dead types as usual. In this way the compiled code will behave as the interpreted code will and especially in a compiled (and dynamically loaded) context against an existing TypeFactory instance we will always have the same garbage collection issues with compiled code as the current interpreter has.

I don't think you need to worry about demotion ( @ben-manes)

Probably I read this wrong. This does not mean there will not be any demotion from the strong cache to the weak, but that it would not take up too much time to demote older references?

How would demotion work? We expect most types to go into the second level almost immediately, so I am worried about the demotion :-)

DavyLandman commented 2 years ago

I don't think you need to worry about demotion ( @ben-manes)

Probably I read this wrong. This does not mean there will not be any demotion from the strong cache to the weak, but that it would not take up too much time to demote older references?

How would demotion work? We expect most types to go into the second level almost immediately, so I am worried about the demotion :-)

In @ben-manes approach, demotion works by the strong cache dropping entries after the 10_000 entries are reached. In that proposed configuration, the oldest entries are dropped (but that can be changed). Since the weak map contains all entries, you can always get the reference back from there (and thus promote them back).

I like the simplicity, but it does mean that you can get a very lock intensive startup phase where they are all putting stuff in the weak cache everytime.

Based on these discussions I will update my branch a bit, make it a bit more simpler.

DavyLandman commented 2 years ago

I've changed the logic a bit (#131), but kept most of it the same. Here are my trade-offs:

The only downside I can see (apart from the 290 lines of code) with the current strategy is that a cache miss on both levels means 2 gets and a put. So 3 hashCode calculations, and a bunch of equals calls in case of hashpostfix-collisions. So it will be slower at startup, but gain speed once the hot cache is filled with entries.

jurgenvinju commented 2 years ago

that's very cool. thank you both @davylandman and @benmanes! Let's measure a bit and see how it improves on what we had before? Might help to refer to this discussion in a source code comment for future reference and future collaborators?

On Mon, Oct 11, 2021 at 3:00 PM Davy Landman @.***> wrote:

I've changed the logic a bit (#131 https://github.com/usethesource/vallang/pull/131), but kept most of it the same. Here are my trade-offs:

  • I've reduced the costs of a cache miss caused by an entry in cold storage, it gets promoted right away, so that the next calls will be getting it back from the hot/strong cache. I've kept the cleanup of these on a separate thread to reduce the cost of that cache miss
  • The maintenance thread occurrence is connected to the demoteAfter period. This means an entry can be almost twice as long in the hot cache as configured, but the trade-off is that the cleaner is running less often. Alternativly we could run it once every 5 minutes and be happy as well.
  • I've opted not to go for a synchronized version of the WeakHashMap as that would cause many locks during cache misses, especially at start-up we have a lot of cache misses, in LSP context, they are spread over different threads, so we do not want have a global lock there.

The only downside I can see (apart from the 290 lines of code) with the current strategy is that a cache miss on both levels means 2 gets and a put. So 3 hashCode calculations, and a bunch of equals calls in case of hashpostfix-collisions. So it will be slower at startup, but gain speed once the hot cache is filled with entries.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/usethesource/vallang/issues/130#issuecomment-940006948, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAPF5F4BWVOFKEBTQJQAUQDUGLNVFANCNFSM5FN3G3KA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Jurgen Vinju http://jurgen.vinju.org CWI SWAT TU Eindhoven Swat.engineering BV

ben-manes commented 2 years ago

The reason why I didn't bother with demotion is because the strong cache's reference keeps the entry from being garbage collectable. Therefore we could avoid shuffling it back to the weak cache on eviction as that work doesn't seem to offer any benefits.

You might consider pulling the custom hashmap code in from Spring (apache) or Hibernate (LGPL). They are self contained and could be copied in directly to avoid the dependency. Neither depend on a background thread and support weak references without allocating on read.

DavyLandman commented 2 years ago

The reason why I didn't bother with demotion is because the strong cache's reference keeps the entry from being garbage collectable. Therefore we could avoid shuffling it back to the weak cache on eviction as that work doesn't seem to offer any benefits.

Just to be clear, this is at the cost of more entries in the weak cache. Right?

You might consider pulling the custom hashmap code in from Spring (apache) or Hibernate (LGPL). They are self contained and could be copied in directly to avoid the dependency. Neither depend on a background thread and support weak references without allocating on read.

True, I've thought about that, but am a bit hesitant, as we ideally would want a concurrent version to avoid locks, but I'll look into it a bit more, since I really would like to get rid of the allocation during the get.

ben-manes commented 2 years ago

Just to be clear, this is at the cost of more entries in the weak cache. Right?

Yes, but since you are already keeping those entries in-memory the cost is minimal (the allocation to store into a hashmap).

am a bit hesitant, as we ideally would want a concurrent version to avoid locks

Those are concurrent by using a segmented hash map inspired by Java 5's ConcurrentHashMap. The main concern is probably the amount of code to copy over, but at least are well proven by now. Otherwise you would need a dependency (e.g. on Guava) which is generally okay, but I agree that is not ideal for some projects.

DavyLandman commented 2 years ago

Thanks to this discussion, I've simplified the logic in the pull request (#131). it's ready for review now.