graphql / dataloader

DataLoader is a generic utility to be used as part of your application's data fetching layer to provide a consistent API over various backends and reduce requests to those backends via batching and caching.
MIT License
12.83k stars 510 forks source link

How do I pass custom options into the batch function DataLoader uses to resolve resources? #147

Closed AJMiller closed 4 years ago

AJMiller commented 6 years ago

I have a circumstance where we want to return a modified result based on a graphQL variable. It appears that the loaders only take in an id argument, leaving no way to pass through other options for the server to consume. Ideally, I'd like to do something like this:

const postLoader = new DataLoader((keys, options) => {
  ...return promise with posts
});

...
const ids = [1,2,3];
postLoader.loadMany(ids, { withCommentType: 'pending' });

The way I have solved it now is to create a separate loader for each option, but this solution seems far less scalable:

const postLoader = new DataLoader((keys) => {
  ...return promise with posts
});
const postWithPendingCommentsLoader = new DataLoader((keys) => {
  ...return promise with posts including comments that are pending review
});

...
const ids = [1,2,3];
postWithPendingCommentsLoader.loadMany(ids);

Are there any plans to allow an options passthrough like this in the future? Or am I missing an alternate solution?

ericf89 commented 5 years ago

Running into something similar. I need my graphql context to resolve my loader func. Since I'm scoping my loaders to each request, this context would be the same for the lifetime of the loader, so maybe it's something that could be passed in at construction?

RAMPKORV commented 5 years ago

I have a similar issue. I want to pass in to the batch function which mysql connection or transaction instance to use, and whether to lock the db rows.

Dastari commented 5 years ago

Was anyone able to find a good solution to passing in arguments into the DataLoader?

Dastari commented 5 years ago

Okay I was able to solve this in a round-about kind of way (I'm using typeorm and type-graphql)

I wanted to be able to have pagination support (so skip and limit) plus an additional where query. I also wanted to be able to do it with nested resolvers. So my test case was something like this:

query Dashboard($loginDate: String, $jobStatus:String, $year:Int) {
  Users(limit: 10, skip: 5, where: "LastLoginDate > :loginDate'", vars:{loginDate:$loginDate} {
    Name
    Country
    ActiveJobs(limit: 10, skip:0, where:"JobStatus = :jobStatus AND YEAR(DateIn)=:year", vars:{jobStatus:$jobStatus, year:$year}) {
      JobNumber
      Customer {
        Name
        PhoneNumber
      }
    }
  }
}

Firstly I wasn't able to get pagination working with any resolvers that use the DataLoader unless I can find a way to use typeorms querybuilder to allow me to GROUP BY and LIMIT results per group. But I still had support for filtering using a where clause.

So I ended up getting a query like this working:

query Dashboard($loginDate: String, $jobStatus:String, $year:Int) {
  Users(limit: 10, skip: 5, where: "LastLoginDate > :loginDate'", vars:{loginDate:$loginDate} {
    Name
    Country
    ActiveJobs(where:"JobStatus = :jobStatus AND YEAR(DateIn)=:year", vars:{jobStatus:$jobStatus, year:$year}) {
      JobNumber
      Customer {
        Name
        PhoneNumber
      }
    }
  }
}

I also got this working using a single generic dataloader without having to write batchload functions for every separate resolver. It's not very optimized, but it might help someone:

index.ts

import "reflect-metadata";
import { ApolloServer } from "apollo-server-express";
import * as Express from "express";
import { createConnection } from "typeorm";
import { createSchema } from "./utils/createSchema";

import { loader } from "./loaders/loader";

const main = async () => {
  const schema = await createSchema();
  const connection = await createConnection();

  const apolloServer = new ApolloServer({
    context: ({ request, response }: any) => ({
      request,
      response,
      loader: loader()
    }),
    schema
  });

  const app = Express();

  apolloServer.applyMiddleware({ app });

  app.listen(4000, () => {
    console.log("Server Started on http://localhost:4000/graphql");
  });
};

main();

loader.ts

import * as DataLoader from "dataloader";
import { In, getRepository } from "typeorm";

import * as GraphQLJSON from "graphql-type-json";

interface ArgList {
  entity?: any;
  key?: string;
  where?: string;
  vars?: GraphQLJSON;
  id: number;
}

const batchLoad = async args => {
  const ids = args.map(arg => arg.id);
  const { where, key, entity, vars } = args[0];

  const results = await getRepository(entity)
    .createQueryBuilder()
    .where({ [key]: In(ids) })
    .andWhere(where ? where : "1=1", vars ? vars : {})
    .getMany();

  const resultsMap: { [key: number]: [typeof entity] } = {};

  results.forEach(row => {
    if ((row[key] as number) in resultsMap) {
      resultsMap[row[key] as number].push(row);
    } else {
      resultsMap[row[key] as number] = [row];
    }
  });

  return ids.map(id => resultsMap[id]);
};

export const loader = () => {
  return new DataLoader<ArgList, any>(batchLoad);
};

This is a snippit from my User Entity which resolves the ActiveJobs

  @Field(() => [Job], {
    nullable: true
  })
  async ActiveJobs(
    @Ctx() { loader }: BaseContext,
    @Arg("where", { nullable: true }) where: string,
    @Arg("vars", () => GraphQLJSON, { nullable: true }) vars: GraphQLJSON
  ) {
    return loader.load({
      where,
      vars,
      id: this.id,     // Primary Key
      key: "userId",   // Foreign Key
      entity: Job      // Return Entity
    });
  }

This is a snippit from my Job Entity which resolves a single Customer (which is also a User Entity)

  @Field(() => User)
  async Customer(
    @Ctx() { loader }: BaseContext,
    @Arg("where", { nullable: true }) where: string,
    @Arg("vars", () => GraphQLJSON, { nullable: true }) vars: GraphQLJSON
  ) {
    return (await loader.load({
      where,
      vars,
      id: this.id,         // Primary Key
      key: "customerId",   // Foreign Key
      entity: User     // Return Entity
    }))[0]; // We still return an array but we only want the first (and only) index.. 
  }

Lastly, if anyone has a better way of doing this i'd love to hear it.

RAMPKORV commented 5 years ago

I went for a solution like this;

In my case I have batch functions that are curied with the database connection to use, like so,

const batchFunctions = {
  myCuriedBatchFn: (dbConnection) => ids => { /* db fetching logic with dbConnection */ }
}

And on each request, when I do is that I map the batchFunctions object into a dataloader object with this function:

const createLoadersFromBatchFunctions = (batchFunctions, options) =>
  Object.entries(batchFunctions).reduce((ack, [ fnName, fnBody ]) => {
    let fnRef; // This function ref will be updated on each call
    let dataloader = new DataLoader(ids => fnRef(ids), options);
    return {
      ...ack,
      [fnName]: (...params) => {
        fnRef = fnBody(...params);
        return dataloader;
      }
    };
  }, {});
tuananh commented 5 years ago

@RAMPKORV can you explain a bit how it works?

I wasn't able to get it to work.

RAMPKORV commented 5 years ago

We could have a set of batch functions like this.

const batchFunctions = {
  getBooks: (con) => async (ids) => {
    let [ rows ] = await con.query('SELECT * FROM books WHERE id IN ? ORDER BY FIELD(id, ?)', ids, ids);
    return rows;
  }
}

To turn it into an object of data loaders we do

let loaders = createLoadersFromBatchFunctions(batchFunctions, {})

And then we can load books in this manner;

let bookFromOldDb = await loaders.getBooks(legacyDbConnection).load(6);
let bookFromNewDb = await loaders.getBooks(newDbConnection).load(4);

How it works? When the dataloader is created, we create a batch function with ids => fnRef(ids), but fnRef will be created at runtime when we run something like loaders.getBooks(newDbConnection)

tuananh commented 5 years ago

@RAMPKORV thank you. lemme try it again.

nrivard commented 5 years ago

Why isn't there an option to pass in a contextual object? java-dataloader has both a generalized version that applies to all loads on a data loader as well as a per-object context that can be passed into a load call.

leebyron commented 4 years ago

Why isn't there an option to pass in a contextual object?

Typically DataLoader instances are creates as part of a request context, in which case other elements of a context can be referenced directly without the need to pass them in. Passing them in by call would require fairly complex logic to partition batches based on equivalent contexts. Instead it's preferred to just create a new instance per context.

To the original question, DataLoader expects a strict key -> value relationship and does not support additional arguments. If additional arguments are needed, they can be considered part of the key (example, { id: 1, withCommentType: 'pending' }) so they can be considered different keys (and thus cached differently) from keys that might represent a similar object with different arguments.

Alternatively (as well discussed above) if there are a small number of potential values for an argument, multiple DataLoaders can be created, one per potential value. However this may depend on your application domain.

joepuzzo commented 4 years ago

So there is actually a sneaky solution to this :)

Assume in this example that we are using sequalize and graphql

// The ids here are objects not ids :) 
const batchGetStatusById = async ids => {
  // The goal was to pass down sql
  return ids.map(({ id, sqlArgs }) => models.Status.findByPk(id, sqlArgs));
};

/**
 * Input ke is complex object so we can pass down sql parameters :)
 * however... we need to teach the dataloader that it should cache on the sql options && the id
 */
const cacheKeyFn = key => {
  return JSON.stringify(key);
};

const options = {
  cacheKeyFn,
};

const dataloaders = () => {
  return {
    status: new DataLoader(batchGetStatusById, options),
  };
};

Now you have your sequal arguments on a per request basis and only if the sql query and id are the same will it pull from cache

byteab commented 4 years ago

I just copy pasted from stackoverflow

// This function creates unique cache keys for different selected
// fields
function cacheKeyFn({ id, fields }) {
  const sortedFields = [...(new Set(fields))].sort().join(';');
  return `${id}[${sortedFields}]`;
}

function createLoaders(db) {
  const userLoader = new Dataloader(async keys => {
    // Create a set with all requested fields
    const fields = keys.reduce((acc, key) => {
      key.fields.forEach(field => acc.add(field));
      return acc;
    }, new Set());
    // Get all our ids for the DB query
    const ids = keys.map(key => key.id);
    // Please be aware of possible SQL injection, don't copy + paste
    const result = await db.query(`
      SELECT
        ${fields.entries().join()}
      FROM
        user
      WHERE
        id IN (${ids.join()})
    `);
  }, { cacheKeyFn });

  return { userLoader };
}

// now in a resolver
resolve(parent, args, ctx, info) {
  // https://www.npmjs.com/package/graphql-fields
  return ctx.userLoader.load({ id: args.id, fields: Object.keys(graphqlFields(info)) });
}
mahmoud63 commented 1 year ago

I just asked openGPT for a solution and it give me this code
here is what I said
`

write a graphql server schema and resolvers for

-   company that have name , owner
-   owner that have name, job Title , company  
    write a resolver to retrieve companies and other one to retrieve owners  
    u must use dataLoader in dynamic way and database must be sql with sequlize lib

can u use single dataLoader for both company and owner with deferent batch function  

`

here is important part :

const  entityLoader  =  new  DataLoader(async (keys)  =>  {

const  entities  = await Promise.all([

Company.findAll({

where:  { id:  keys.filter((key)  =>  key.type  ===  'Company')  },

}),

Owner.findAll({

where:  { id:  keys.filter((key)  =>  key.type  ===  'Owner')  },

}),

]);

 return entityLoader.load({ type:  'Owner', id:  company.ownerId  });

and this was the full answer

`

const  {  DataLoader  }  =  require('dataloader');

const  {  sequelize  }  =  require('sequelize');

const  Company  =  sequelize.define('Company',  {

name:  {

type:  sequelize.STRING,

allowNull:  false,

},

});

const  Owner  =  sequelize.define('Owner',  {

name:  {

type:  sequelize.STRING,

allowNull:  false,

},

jobTitle:  {

type:  sequelize.STRING,

allowNull:  false,

},

});

Company.hasOne(Owner);

Owner.belongsTo(Company);

const  entityLoader  =  new  DataLoader(async (keys)  =>  {

const  entities  = await Promise.all([

Company.findAll({

where:  { id:  keys.filter((key)  =>  key.type  ===  'Company')  },

}),

Owner.findAll({

where:  { id:  keys.filter((key)  =>  key.type  ===  'Owner')  },

}),

]);

const  companies  =  entities[0];

const  owners  =  entities[1];

return keys.map((key)  =>  {

if  (key.type  ===  'Company')  {

return companies.find((company)  =>  company.id  ===  key.id);

}  else  if  (key.type  ===  'Owner')  {

return owners.find((owner)  =>  owner.id  ===  key.id);

}

});

});

const  resolvers  =  {

Query:  {

companies: async ()  =>  {

const  companies  = await Company.findAll();

return companies;

},

owners: async ()  =>  {

const  owners  = await Owner.findAll();

return owners;

},

},

Company:  {

owner: async (company,  args,  context)  =>  {

return entityLoader.load({ type:  'Owner', id:  company.ownerId  });

},

},

Owner:  {

company: async (owner,  args,  context)  =>  {

return entityLoader.load({ type:  'Company', id:  owner.companyId  });

},

},

};

module.exports  = {

typeDefs,

resolvers,

};

`

In this example, the single DataLoader instance entityLoader takes an array of keys, where each key is an object with two properties: type and id. The type property is used to distinguish between Company and Owner entities, and the id property is used to fetch the entities from the database. In the batch function, keys are filtered based on their type, and Company and Owner entities are fetched separately. Finally, the entities are returned in the same order as the keys, so that the correct entity is matched with the correct key.


I don't know if that a good way or not so waiting ur feedback :)