keystonejs / keystone

The most powerful headless CMS for Node.js — built with GraphQL and React
https://keystonejs.com
MIT License
8.99k stars 1.13k forks source link

Dedupe ids when resolving batches of related items #9047

Closed molomby closed 4 months ago

molomby commented 4 months ago

This reduces the amount of data sent to the DB in SQL queries and reduces the number of parameters needed.

I've seen Keystone generate queries over over 300 KB in size, where 10,000 parameters are passed with identical values, all to return a single record with a dozen columns. After this change the same query would be well under 1 KB. There should also be some marginal reduction in DB CPU load – assuming that dealing with 1 parameter is easier that 1000's, even if done so efficiently.

The DB effectively dedupes ids on it's side already (ie. select * from "Things" where id in (1,1,1,1,1,1) will only return a single record) so this code should be functionally identical. Being a data loader batch function, this fetchRelatedItems() function returns an element for each id provided, regardless of duplicates in the input.

codesandbox-ci[bot] commented 4 months ago

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

Latest deployment of this branch, based on commit 29b8b9cd9f008ca075ca85a3c495a81db552c2e3:

Sandbox Source
@keystone-6/sandbox Configuration
dcousens commented 4 months ago

We were de-duping the return result by using new Map a few lines down, so I expect no change in output behaviour here

molomby commented 4 months ago

Your half right! This change won't alter the output of the function but the map your talking about is actually used to create duplicates in the return array when needed (as well as ensure the correct order). This is a data loader batch function so it's contract insists that:

• The Array of values must be the same length as the Array of keys. • Each index in the Array of values must correspond to the same index in the Array of keys.

Ie. if you ask for keys [1, 1, 1] you'll get back an array with three identical elements. The results returned from the DB, that are being added to the map, won't contain any duplicates regardless of this code change.

The issue here is deduping the keys. Previously this was being done but the DB engine, this PR just moves that to node.