RTK-Query: Code Splitting plus rehydration?

mq2thez commented 2 years ago

Hey folks!

Our app is heavily code-split, and leans on pre-hydrated server data to get the first render as fast as possible. A lot of the architecture is set up so that we can add lots (30+ at last count) individual "subapps" without significantly impacting the bundle size of the main bundle that handles routing and the "shell" for menus and notifications. You can see some discussion of one version of this architecture in Etsy's blog post on Island Architecture.

One thing that helps this architecture scale is that each "subapp" injects its reducers into the main Redux store when it loads. This matches pretty closely with the patterns shown in the injectEndpoints documentation, which is great -- I can see pretty clearly exactly how things will map.

However, one sticking point is that we currently rely heavily on server-rendered data being present for first render, so as to avoid being blocked on additional data fetches when the client boots up. The existing extractRehydrationInfo docs seemed aimed squarely at solving this kind of hydration, but it's only available on the createApi method.

I'm wondering what kind of options there are for us to set rehydration info as part of injectEndpoints. I was looking at onCacheEntryAdded or onQueryStarted as places to add some kind of custom "detect when it's the first load and inject server data" pattern, but that felt like a kludge and I wanted to come get a sanity check before I went too far.

It also seems like maybe the "Customizing createApi" might have a path towards this, but in the same way -- this started feeling like I was drawing outside the lines and might be picking up on the wrong solutions.

phryneas commented 2 years ago

That just should work with extractRehydrationInfo and dispatching an action with the additional data every time additional endpoints are injected. Have you verified that to actually not work?

mq2thez commented 2 years ago

I think part of what I'm struggling to figure out is:

where to dispatch the action. For example, inject calls generally seem to occur outside of a React tree and inline in a file, rather than in a React context.
how to correctly format the data that's hydrating things in the store -- the docs for extractRehydrationInfo say "that return value will be used to rehydrate fulfilled & errored queries", but it's not clear how this relates to setting initial state for entrypoints. It seems like I could use it to set initial state once (such as for that hydrate action in the example), but I don't see any mechanism for handling when there's already active state and I'm only adding hydration for just that one endpoint's worth of state.

phryneas commented 2 years ago

You can always `import store from somewhere" - then you can dispatch on that.

The value returned by extractRehydrationInfo has to be a store api slice with the information you want to have rehydrated. From the back of my head, that won't remove any data already in your slice. How exactly you shape that action is up to you.

markerikson commented 2 years ago

Quoting an exchange from Twitter at https://twitter.com/sangster/status/1566737486633750528 :

Essentially, we have server-hydrated data we need to load into RTKQ caches in a code-split scenario, and it's not clear how to accomplish that. Without that, we can't really start adopting RTKQ. upsert does seem like one option, but it seems like the implementation is pretty manual -- I can't do someApi.upsert(data), but instead have to construct the full cache key. That's fine for Joe Expert, but will require custom tooling to be usable by any random engineer. I haven't been able to spend much time on this, but I can't really understand how extractRehydrationInfo (the suggested solution) works to resolve the issue. The docs say it rehydrates by rehydrating, which... doesn't give a whole lot to go on. With upsert the workflow seems somewhat more clear - call a helper function with initial data (if present) to inject API endpoints and then upsert for the relevant endpoints. Even that will be tricky (for ex, utilizing react-router / query params) but hopefully solvable. In theory, one answer for this would also be if we could provide data in the React hook to be used if a query is uninitialized. The complexity there is avoiding an unnecessary request back to the server in those cases -- we know we have the data, we don't want another req.

phryneas commented 2 years ago

At the "rehudrating by rehydrating" thing... well, yeah, exactly that - let me try to explain!

next-redux-wrapper by default dispatches a "rehydration action" containing the full state of the server. reducers can then choose to react to that action and restore data from it. And that's where extractRehydrationInfo comes in. Every action is passed into extractRehydrationInfo - and if it an action containing "rehydration data", the function can return the part of that that would make up the api.

Usually that looks like

  extractRehydrationInfo(action, { reducerPath }) {
    if (action.type === HYDRATE) {
      return action.payload[reducerPath]
    }
  },

but nothing should stop you from having "partial rehydration actions" that are dispatched with parts of a server-side store, or only some endpoint cache entries filled. Something like

  extractRehydrationInfo(action, { reducerPath }) {
    if (partialRehydrateApi.match(action) {
      return action.payload
    }
  },

would totally work. The "rehydration api state" in action.payload would be merged into the existing store whenever such an action would be dispatched.

markerikson commented 2 years ago

@phryneas I think a large part of Ben's question is, "what does that 'rehydrated state' even _contain?"

I'm actually unsure myself. Looking through our code, I see logic like this:

        .addMatcher(hasRehydrationInfo, (draft, action) => {
          const { queries } = extractRehydrationInfo(action)!
          for (const [key, entry] of Object.entries(queries)) {
            if (
              // do not rehydrate entries that were currently in flight.
              entry?.status === QueryStatus.fulfilled ||
              entry?.status === QueryStatus.rejected
            ) {
              draft[key] = entry
            }
          }
        })

So I would assume that it's literally a persisted/exported chunk of state.api, like:

{
  queries: {
    "getPokemon('pikachu')": {data, isLoading, etc}
  }
}

That means that either A) it's persisted from a previous RTKQ iteration (ie, the data already got run through RTKQ and had all the cache keys calculated), or B) you'd have to somehow manually calculate the right cache keys and fake the data structure.

I am actually curious how @mq2thez envisions loading this data to begin with. It sounds like you've got some data you want to load into the store, but you might not know what the appropriate cache keys are up front? How do you picture getting the data into RTKQ?

I get the sense that it's more a case of "we have raw data that we want to insert", rather than "we have persisted existing cache entries to restore".

phryneas commented 2 years ago

It's the api slice from the server. Usually during SSR you fire those exact requests on the server, with a full RTKQ store behind. Then you'd take that "server store state", move it over to the server and it gets dispatched as payload of a REHYDRATE action. So extractRehydrationInfo expects a full RTK Query api slice. Of course that could only contain only one or two cache entries and that'd be perfectly fine - those would just be merged in.

mq2thez commented 2 years ago

I think I'm having a hard time understanding where extractRehydrationInfo fits into the Redux "lifecycle". Is it a reducer that controls some of the value of the API slice? Is it somehow changing the value of some part of state? Is it modifying an action? It's not being passed any values which map to what I "expect" to see in Redux stuff (like state), but it is being passed a reducer path, which I have trouble following (since it is used on the action's payload).

@markerikson -- right now, each code-split bundle contains a component tree, reducers, selectors, etc for its functionality. Our top level component on-mount says "okay, if the server-data-object has fresh data and it's for our slice, dispatch an action to load that into state". I'm essentially looking for the mechanism in RTKQ that would let us load that data into state for the API query relevant to that chunk of app -- without putting that code in the main bundle, since we have 40+ codesplit things.

phryneas commented 2 years ago

I get the sense that it's more a case of "we have raw data that we want to insert", rather than "we have persisted existing cache entries to restore".

That doesn't sound like SSR rehydration to me though?

phryneas commented 2 years ago

I think I'm having a hard time understanding where extractRehydrationInfo fits into the Redux "lifecycle". Is it a reducer that controls some of the value of the API slice? Is it somehow changing the value of some part of state? Is it modifying an action? It's not being passed any values which map to what I "expect" to see in Redux stuff (like state), but it is being passed a reducer path, which I have trouble following (since it is used on the action's payload).

It's a function. That function is called for every dispatched action in middleware as well as from a reducer - and it extracts the api slice from an action containing a full state snapshot (or just returns undefined if that action is unrelated). Well, the "full state snapshot" is kinda irrelevant. The relevant is that it sees some action and returns the state of an api slice. Usually that would be part of the action.

Also see https://github.com/kirill-konshin/next-redux-wrapper#state-reconciliation-during-hydration

markerikson commented 2 years ago

I think I'm having a hard time understanding where extractRehydrationInfo fits into the Redux "lifecycle"

Per Lenz's comment and the snippet I pasted above: it's primarily called in a builder.addMatcher() call in the internal slice reducers, so they can extract these existing cache entries from any dispatched action and insert them directly into the cache reducer state. There's also one call in the cacheCollection pseudo-middleware so it can set up unsubscribe handling for those.

okay, if the server-data-object has fresh data and it's for our slice, dispatch an action to load that into state". I'm essentially looking for the mechanism in RTKQ that would let us load that data into state for the API query relevant to that chunk of app

@mq2thez : yeah, that's what I'm asking. What format is this data in, to begin with? Is it currently just "raw" data, like an array of plain objects? Has it been preformatted in some way? If it's "raw" data, how do you envision RTKQ knowing what the right cache keys are for those items? Like, say it's [{id: 1, name: "pikachu", type: "electric"}] or something

phryneas commented 2 years ago

If you don't want to rehydrate a "server side RTKQ store" into a "client side RTKQ store", extractRehydrationInfo is probably the wrong tool. Then you would need the new upsertQueryData tool, but as @markerikson said: you need to somehow map that to the right query args.

mq2thez commented 2 years ago

The data is an object full of data (yeah, raw). It's been formatted by the API endpoint for that code bundle, and the bundle knows what the shape/types are. In our current world, the bundle's code knows how to extract the data correctly into our existing state tree (well, fire an action that has data loaded from that object).

mq2thez commented 2 years ago

Yeah, in this case, I think (in theory) each bundle can be responsible for ensuring it's formatting things right. We just need that mechanism (upsert, I guess) for doing so when each bundle loads.

markerikson commented 2 years ago

Based on this discussion, I'm sort of wondering if the alpha upsertQueryData API we've added is maybe insufficient.

We implemented it as a modification to the existing request/response logic, as that let us reuse all the existing mechanisms for creating cache entries and updating status.

However, that also means that it's async, and also you can only add one cache entry at a time.

You could always loop over data to do the insertion, of course, but I'm wondering if it maybe makes sense to do it as a "bulk"-type method to begin with, and maybe make it sync somehow.

mq2thez commented 2 years ago

I'm not sure -- one of the other things that feels like a struggle is that I know when I inject an endpoint whether I have the initial data for it. So the most-natural-feeling place to do this is as part of the injectEndpoints call, to the point where I already see myself trying to create a helper/wrapper function that combines calling injectEndpoints and trying to dispatch an action (such as upsetQueryData), though that doesn't fix the problems with stuff like react-router data.

Taking a step back in terms of generalizing the "need" here -- I know what the first response for an injected API endpoint "should" be, on the initial page load. I just need a way to load that into the store/state/etc in such a way that RTKQ hooks can use that data instead of causing RTKQ to trigger a new request.

markerikson commented 2 years ago

Yeah, the only way to have the hooks use that data atm is to have a properly formatted cache entry matching the query cache key you pass into the hook.

@phryneas , remind me why you didn't like the idea of useGetPokemonQuery('pikachu', {initialData}) ?

phryneas commented 2 years ago

@markerikson because it's not really clear to me what that should do.

Should it just return initialData as if the data were in the cache? Then it would not be there elsewhere in the app where it would be expected as well.

Should it upsert the data? What if two components would render at the same time, with different initialData? Would one randomly "win" over the other for upserting?

mq2thez commented 2 years ago

For us, the "upsert" behavior would be what we need, and if the data is different, it seems fine (again, to us) to have it pick whichever comes first. But I can see how that could be tricky.

phryneas commented 2 years ago

I mean, there is also another option: Not populating the store, but resolving requests from that array in a baseQuery wrapper:

const dataFromServer = [ ...stuff... ];

const baseQuery = fetchBaseQuery({ normal config })

const wrappedBaseQuery: typeof baseQuery = (arg, api) => {
  if (hasDataFromServer(api.endpoint, arg)) {
    try {
      return { data: getDataFromServer(api.endpoint, arg) }
    } finally {
      // the next request should go to the server, not to the array - so delete it from the array
      deleteDataFromServer(api.endpoint, arg)
    }
  }
  return baseQuery(arg, api)
}

mq2thez commented 2 years ago

I replied on Twitter, but to close the loop here: I think that actually has a lot of potential. It looks like we can customize on a per-endpoint basis with queryFn (which we will already be using), so we could have a queryFn and queryFnWithInitialValue (or something similar) helper. I'm not sure if it works quite right for our SSR case, since I imagine that this wouldn't be part of an initial render (but instead part of an effect?). That said, I can see how it would fit into our existing patterns to at least not require additional changes.

I'll be trying to find more time to progress on this in the next week or two. Thank you so much for making time to chat about it!

markerikson commented 2 years ago

Yeah, the queries are always run in effects, so it would be post-first-render.

reduxjs / redux-toolkit

RTK-Query: Code Splitting plus rehydration? #2583