graphql / dataloader

DataLoader is a generic utility to be used as part of your application's data fetching layer to provide a consistent API over various backends and reduce requests to those backends via batching and caching.
MIT License
12.83k stars 510 forks source link

[QUESTION] Why aren't keys being de-duplicated even when the cache is disabled? #311

Closed sidoshi closed 1 year ago

sidoshi commented 2 years ago

Is there a reason why DataLoader doesn't deduplicate ids provided to the batch function when the cache is set to false?

const dataloader1 = new DataLoader(async function loadMany(ids) {
    // [[1],[2],[3],[4],[5],[1],[2],[3],[4],[5]]
    console.log(ids)
    return ids
}, {
    cacheKeyFn: (id) => id[0],
    cache: false
})

const ids = [[1],[2],[3],[4],[5],[1],[2],[3],[4],[5]]
ids.forEach(id => dataloader1.load(id))

//--

const dataloader2 = new DataLoader(async function loadMany(ids) {
    // [[1],[2],[3],[4],[5]]
    console.log(ids)
    return ids
}, {
    cacheKeyFn: (id) => id[0],
})

const ids = [[1],[2],[3],[4],[5],[1],[2],[3],[4],[5]]
ids.forEach(id => dataloader2.load(id))

I'm not sure if this would be a feature request or if it would be considered a bug? Or am I missing something that prevents us from de-duplicating the keys provided to the batch function? :)

thekevinbrown commented 1 year ago

It actually is deduplicating the IDs, but it's doing it with an equality check. If you open your JS console and run some tests:

1 === 1
=> true
[1] === [1]
=> false

This is because each array is a different object, so they are not reference equal. If you pass in the same array reference:

const a = [1];
=> undefined
const b = a;
=> undefined
b === a;
=> true

then they'll be the same.

In your example case, just flatten the arrays so you're passing in pure numbers and they'll be de-duped for you by dataloader.

yuliswe commented 1 year ago

HI, is there a way I can set a custom batchKeyFn similar to cacheKeyFn? I want to use an object as the key and I need to customize the dedup logic.

zq0904 commented 1 year ago
  const dataLoader2 = new DataLoader(async (ids) => {
    dataLoader2.clearAll();
    // [
    //   { id: 1, cid: 2 },
    //   { id: 2, cid: 3 },
    //   { id: 3, cid: 4 },
    //   { id: 4, cid: 5 },
    //   { id: 5, cid: 6 }
    // ]
    console.log(ids)
    return ids
  }, {
    cacheKeyFn: (obj: any) => {
      // return JSON.stringify(obj) // Even extreme
      return obj.id
    }
  }
  )

  // Let's say the first 5 requests come from one user and the next 5 requests come from another user because the cache uses a Map it's really hard to reuse the same reference (from different contexts of requests)
  ;[
    { id: 1, cid: 2 },
    { id: 2, cid: 3 },
    { id: 3, cid: 4 },
    { id: 4, cid: 5 },
    { id: 5, cid: 6 },

    { id: 1, cid: 2 },
    { id: 2, cid: 3 },
    { id: 3, cid: 4 },
    { id: 4, cid: 5 },
    { id: 5, cid: 6 },
  ].forEach((id) => dataLoader2.load(id))
zq0904 commented 1 year ago

https://github.com/graphql/dataloader#disabling-cache

denis-sokolov commented 1 year ago

Duplicate of #280.

sidoshi commented 1 year ago

@denis-sokolov I checked for duplicates but I missed that issue. Closing this as the other issue does a better job at explaining the problem. Thanks