reduxjs / reselect

Selector library for Redux
MIT License
19.03k stars 672 forks source link

Investigate potential memory leak issues with `weakmapMemoize` #635

Open markerikson opened 10 months ago

markerikson commented 10 months ago

Copying notes from @phryneas :

phryneas: Thoughts on combining that with a max cache size like the weak cache I showed you? eskimojo: is having a maximum cache size desirable? phryneas: If your cache approaches 5000 elements that will never be read from again, very much so phryneas: I've been doing a lot drilling into memory leaks recently and a lot of them are weakMaps with keys that hang around for too long: phryneas: Eskimojo - so you know what PR we're talking about: https://github.com/benjamn/optimism/pull/599 phryneas: : Yeah.. looking at WeakMapMemoize it's very close to https://www.npmjs.com/package/@wry/trie and a lot of the reasons for Apollo Client memory leaks. It's funny, because that way reselect essentially becomes https://www.npmjs.com/package/optimism 😅

So here's what I'd do: use the current weakMapMemoize implementation as a way of generating unique cache keys, but never store any actual data in there. Instead, use those keys as cache keys to a "flat" cache with a maximum size - if possible weakly indexed, like the implementation I linked above. phryneas: Right now, the saving grace is that your root selector will always be an object (the state), and hopefully at some point all of it's references will disappear, which will prune a full Trie. But that's not a guarantee anymore for child selectors (or selectors that are just used with something different than a state). Imagine they are called with (primitive, primitive, primitive). Every result for every possible primitive combination that was ever input into one of those selectors will be kept around for the full application lifetime. phryneas: And at least for the second plan (or the first plan, assuming something keeps state values around for longer) you need some kind of max cache size - at least if you make it the default and people don't have to carfully think about their choices

markerikson commented 9 months ago

More notes from Lenz and me:

acemarke : ok, questions:

  • is this just for the "primitive args" case?
  • how would we test this?
  • what would an appropriate max size be?

Lenz Weber-Tronic : Not only.

acemarke : where's that toBeGarbageCollected coming from? Lenz Weber-Tronic : https://github.com/apollographql/apollo-client/pull/11369/files#diff-83faa655149377a65d05a36458355e0a84925b36d1ebe017f68196c27a369670 acemarke : ahhhh, okay, that explains why it wasn't turning up in a GH search :) eskimojo : love a custom matcher Lenz Weber-Tronic : Need to run jest with a node --expose-gc for that to work

acemarke : I feel like there's still something I'm missing about the memory leak possibility. Any chance you could walk me through the possible sequence that leads to an actual leak? Lenz Weber-Tronic : memoizedFunction(someObj,3,8) will end you up with this data structure:

WeakMap([[someObj, Map([[3, Map([[8, result]])]])]])

and memoizedFunction(5,3,8) ends you up with

Map([[5, Map([[3, Map([[8, result]])]])]])

The first case is fine, assuming that someObj is your Redux state and it leaves memory soon. The secondcase will hold onto result forever. Lenz Weber-Tronic : Now, if in the first case, someObj is already a derived value that just does never change, but 3,8 change to 5,8 you end up with

WeakMap([[someObj, Map([[3, Map([[8, firstResult]])], [4, Map([[8, nextResult]])]])]])

Lenz Weber-Tronic : and as long as someObj will not leave memory, those deeply nested strong maps will just grow infintely as the primitive arguments change. acemarke : (I think I like Ruby-ish / fat-arrow notation better for expressing key/value pairs :) but okay, mostly following) Lenz Weber-Tronic : So, worst case you have a Redux store that never changes (or a slice that is derived as the first argument of a sub-selector), but a component iterates in 10000 renders over 10000 items in a list, and creates a very complex and memory-heavy object every time

image

Lenz Weber-Tronic : Something that would get really bad with this kind of selector design would be a selectShopItems(state, from, to) { return state.items.slice(from, to) } Now imagine you have an endless scrolling list, the user scrolls down, and from and to increase with every scroll event slightly Lenz Weber-Tronic : You'd have all the arrays 1-10, 2-11, 3-12, 4-13, ..., 10001-10010 in memory long after the user scrolled by, as long as state doesn't change. Lenz Weber-Tronic : Even worse, for each of those arrays, two new maps would be created

markerikson commented 9 months ago

Lenz tackled similar issues on the Apollo side here:

markerikson commented 9 months ago

Lenz's latest example:

import { weakMapMemoize, createSelectorCreator } from "reselect";

const memoize = createSelectorCreator(weakMapMemoize);

const selector = memoize(
  (array: any[], from: number, to: number) => array,
  (array: any[], from: number, to: number) => from,
  (array: any[], from: number, to: number) => to,
  (array: any[], from: number, to: number) => array.slice(from, to),
);

let state = Array(10000)
  .fill(null)
  .map((_, i) => i);

const initialResult = new WeakRef(selector(state, 2000, 2500));

for (let i = 0; i < 10000; i++) {
  selector(state, i, i + 100);
}

const laterResult = selector(state, 2000, 2500);

console.log(
  "results are still memoized?",
  initialResult.deref() === laterResult,
);

In one sense that works as intended. But I think the point is there ought to be some limits.

julienw commented 9 months ago

I really like that you provide this new function weakmapMemoize, it can solve a bunch of issues and simplify a lot of code.

However I am worried about the fact that it is now the default memoize function (BTW it's not clear in the doc but clear in the release announcement). As soon as we're using a primitive value in a selector, and the result of this selector is used in other selectors, we may have a chain of leaks (this depends on the other arguments too of course). (I'm more concerned about the memoize function than the argsMemoize function, because in our app we mostly never use several arguments to selectors)

The result of a selector may also have a significant size and we may have relied on the fact that reselect keeps just one copy, and suddenly this changes. (Though I understand that's what a major update is for).

Of course we can always go back to the lruMemoize function, but I wish weakmapMemoize would not be the default, so that we could use it if we want, maybe try it and adopt it first, so that you could gather feedback during some time. It would be good to have some explicit code to create a createSelector with the old behavior. If I understand properly this would be:

import { createSelectorCreator, lruMemoize } from 'reselect';
const createSelectorLru = createSelectorCreator({ memoize: lruMemoize, argsMemoize: lruMemoize });

Maybe such an export could be exported from reselect for an easier migration?

Last bit of feedback: the memoize function has a special status in that it can be specified as an argument to createSelectorCreator directly, but now I think memoize and argsMemoize are sibling (in a sort), and therefore maybe we shouldn't be able to pass simply the memoize function to createSelectorCreator, and instead force the user to pass both functions. My concern is that as part of the update users will simply forget to specify the argsMemoize function if they used this shape earlier, I'd assume they'll want to specify both if they specify one.

I recognize this comment isn't super constructive, feel free to delete it if you feel it's useless for you.

psychedelicious commented 8 months ago

(This is adapted from this discord thread: https://discord.com/channels/102860784329052160/103538784460615680/1192629604284907651)

Problem

I've recently updated a project to RTK2. Everything seemed fine, but we started getting user reports of stutters and performance issues. After investigating deeper, it appears that weakMapMemoize is the main driver of the problem.

Specifically, we are getting very frequent major GC events when before we had virtually none.

RTK2 without weakMapMemoize

It's possible to opt-out of weakMapMemoize in most of RTK, except RTKQ uses it internally for its selectors.

To test the app without RTKQ using weakMapMemoize, I patched reselect, exporting lruMemoize as weakMapMemoize: https://github.com/psychedelicious/reselect/commit/2250a2ab05462bd1c62ad5815b8e51c3b87a5bb9

I also add a console.log so we can be confident that we are using the patched reselect when running the app. Then built RTK using this patched reselect, and used it in our app.

Test Case

I've run the chrome profiler with 3 configurations, while I do the same action in the app, using the same persisted redux state.

This action updates a single small object in redux very rapidly. The problem shows up in other places, but due to the frequency of this particular action, it's very apparent here.

Test Results

Here are the configurations and screenshots of memory allocations from the profiler.

Stock RTK2 - weakMapMemoize everywhere (for both memoize and argsMemoize):

image

Configured RTK2 - lruMemoize where it's configurable (in our createSelectorCreators and createEntityAdapter's getSelectors()), and weakMapMemoize for RTKQ only:

image

Patched reselect with weakMapMemoize = lruMemoize:

image

Observations

Notes

Minimal Repro / Profiles

I will work on a minimal repro project that uses the same redux patterns as our full app, but it may be a couple days before I can put this together. (the app is OSS if it's helpful to review a real-world example).

I'm also happy to share profiles/memory allocation timelines/etc.

Proposed Solution

I think it's totally fine for weakMapMemoize to be the new default. But I also think we need to be able to control it in RTKQ. To that effect, my particular issue would be mitigated by what I think would be a simple change:

aryaemami59 commented 8 months ago

@psychedelicious I'll be very interested to see the repro, but based on your explanation I have a suspicion that either the combination of useMemo and selectors created with weakMapMemoize or resultEqualityCheck being isEqual might be playing a role in contributing to the issue.

psychedelicious commented 8 months ago

@aryaemami59 There's actually a separate issue I've not yet reported. resultEqualityCheck doesn't work at all with weakMapMemoize. edit: logged in #679, sorry I neglected to raise this when I first encountered the problem.

Repro (increment the counter): https://codesandbox.io/p/sandbox/dazzling-kapitsa-gtqrl5?file=%2Fsrc%2Fapp%2Fstore.js%3A12%2C5

When I upgraded to RTK2, I had to choose between using lruMemoize and isEqual, or weakMapMemoize with no resultEqualityCheck - so, in a way, it's definitely not a problem with weakMapMemoize + isEqual.

Def could be related to useMemo and weakMapMemoize. I can't really imagine why that would be problematic, though... Wouldn't this just result in many size=1 weakmap caches that get GC'd the same as if they were all LRU caches?

Will work towards a repro to clarify.

aryaemami59 commented 8 months ago

@psychedelicious On first glance, one thing that can cause a lot of issues is having state => state in your selectors. https://reselect.js.org/usage/common-mistakes

psychedelicious commented 8 months ago

@aryaemami59 I must have missed that memo (heh). Thanks for pointing that out - I've used state => state as a convenient input selector for just about all of my selectors.

I just went through the whole app and converted all input selectors to be like state => state.slice. There are no identity selectors now.

Unfortunately, this has had no discernable impact on the memory issue.

Also, using resultEqualityCheck with weakMapMemoize still causes a fatal runtime error.

psychedelicious commented 8 months ago

Updates for my issue

  1. @aryaemami59 very kindly gave me some pointers on our usage of reselect which I've implemented:
    • Do not use state => state as an input selector
    • Reduce resultEqualityCheck usage except when necessary (I used this for bulk primitive property access as a convenience)

These improvements save about 4ms per animation frame on my computer, which is relevant for us. Thanks Arya!

Unfortunately, the memory usage with weakMapMemoize was not improved.

  1. We've got an RC out with the reselect patch weakMapMemoize = lruMemoize and the reselect usage fixes. Users report this resolves the performance regression (and then some). This is further confirmation that weakMapMemoize can be problematic for some use patterns.
EskiMojo14 commented 8 months ago

@psychedelicious I've opened a PR https://github.com/reduxjs/redux-toolkit/pull/4048 to allow customising the createSelector instance used by RTKQ. Do you reckon you could give the build a try?

paulgreg commented 3 months ago

Since updating from reselect 4.1.7 to 5.1.0, we have noticed a memory leak issue on our node servers.

image

On that zabbix graph, the memory is released when node (v20.12.2) restart.

After investigating the Heap snapshot, I discovered multiple references to previous selector values, which led me to suspect that the change in reselect version is the cause of this issue.

As a temporary solution, I reverted back to using lruMemoize as suggested in https://github.com/reduxjs/reselect/issues/635#issuecomment-1850350607. However, I found it peculiar that weakmaps were never released.

markerikson commented 3 months ago

@paulgreg just to check, when you say "weakmaps were never released", do you mean "why didn't the weakmaps let go of the references they were holding on to"?

paulgreg commented 3 months ago

@markerikson yes, exactly.

hpierre74 commented 2 months ago

Hello there, I confirm that the weakMapMemoize resulted in a memory leak on my side as well. The different traces I observed are similar to the different comments here, memory grows and despite major GCs this continues until reaching 3, 4 GB when it becomes basically unusable.

My use case is a realtime application with very frequent updates on 2k elements (flights). 3 slices using entity adapters and its selectors to access those flights and their information.

An example slice, if you find something weird

import {
  createEntityAdapter,
  createSelectorCreator,
  lruMemoize,
  createSlice,
  EntityState,
} from '@reduxjs/toolkit';

import { tracksActions } from '@hmi/core-features-tracks';

import type { TrackLabel } from './track-label.type';

// See issue https://github.com/reduxjs/reselect/issues/635#issuecomment-1850350607
const createSelector = createSelectorCreator({
  memoize: lruMemoize,
});

export const TRACK_LABELS_FEATURE_KEY = 'trackLabels';

export interface TrackLabelsState extends EntityState<TrackLabel, string> {
  loadingStatus: 'not loaded' | 'loading' | 'loaded' | 'error';
  error?: string | null;
}

export const trackLabelsAdapter = createEntityAdapter<
  TrackLabel,
  TrackLabel['trackId']
>({
  selectId: (model) => model.trackId,
});

export const initialTrackLabelsState: TrackLabelsState =
  trackLabelsAdapter.getInitialState({
    loadingStatus: 'not loaded',
    error: null,
  });

export const trackLabelsSlice = createSlice({
  name: TRACK_LABELS_FEATURE_KEY,
  initialState: initialTrackLabelsState,
  reducers: {
    add: trackLabelsAdapter.addOne,
    remove: trackLabelsAdapter.removeOne,
    removeMany: trackLabelsAdapter.removeMany,
    upsertMany: trackLabelsAdapter.upsertMany,
    upsertOne: trackLabelsAdapter.upsertOne,
  },
  extraReducers: (builder) => {
    builder.addCase(tracksActions.removeMany, (state, action) => {
      trackLabelsAdapter.removeMany(state, action.payload);
    });
  },
});

export const trackLabelsReducer = trackLabelsSlice.reducer;

export const trackLabelsActions = trackLabelsSlice.actions;

type LocalRootState = {
  [TRACK_LABELS_FEATURE_KEY]: TrackLabelsState;
};

export const selectTrackLabelsState = (
  rootState: LocalRootState,
): TrackLabelsState => rootState[TRACK_LABELS_FEATURE_KEY];

const { selectAll, selectEntities, selectIds, selectById } =
  trackLabelsAdapter.getSelectors(undefined, { createSelector });

export const selectAllTrackLabels = selectAll;
export const selectTrackLabelsEntities = selectEntities;
export const selectTrackLabelsIds = selectIds;
export const selectTrackLabelById = createSelector(
  [selectTrackLabelsState, (_: LocalRootState, id: string) => id],
  (state, id) => selectById(state, id),
);

export const selectTrackLabelFieldByNameAndUid = createSelector(
  [
    selectTrackLabelsState,
    (_, args: { trackId: string; fieldName: string; uid: string }) => args,
  ],
  (trackLabelsState, { trackId, fieldName, uid }) =>
    selectById(trackLabelsState, trackId)?.fields[fieldName]?.configValues?.[
      uid
    ] ?? '',
);

I am still unhappy about the memory consumption when the trafic increases, even if it goes down more often (like observed in the comments). It looks like reselect is still preventing memory release. What is concerning is that the reselect docs is stating that in case of huge list, the weakMapMemoize solution is way better. Here it is the exact opposite

markerikson commented 2 months ago

@hpierre74 any chance you could turn that into a small repro? Usage patterns are going to affect this and it would help to see how you're dispatching and reading the values.

If this is causing issues for you, you may want to switch back to lruMemoize for your selectors.

hpierre74 commented 2 months ago

Hey @markerikson, thanks for your answer. I sent you a PM on twitter with more info :)