nuwave / lighthouse

A framework for serving GraphQL from Laravel
https://lighthouse-php.com
MIT License
3.35k stars 438 forks source link

Select only necessary columns for optimized queries #1356

Open kplgr opened 4 years ago

kplgr commented 4 years ago

What problem does this feature proposal attempt to solve? I am trying out GraphQL, testing various features in order to include it in some of the web apps I am developing with laravel. Being curious, I created a Schema and fired up all sorts of queries, followed by poking around in the SQL server to see what was going on.

What I discovered was this:

Suppose I have a Type defined which corresponds to a Model, and the database table has an ID column and 26 more columns (A, B C, ..., Z).

No matter what the Type definition is, which in my case only includes some of the columns ( say, ID and A, B, C), I can see that the SQL server is asked to fetch all 27 columns from the DB, using a SELECT * FROM [Table] statement, even though given the definition of the type at most 4 of them would be returned via the resolver.

This is an unnecessary data transfer between the DB and PHP, in my opinion.

Which possible solutions should be considered?

I propose that we construct a more refined SQL query, selecting only the columns that are requested in the GraphQL query.

I am not sure I am competent enough to try and implement this on my own, but I will give it a shot. In any case, I thought I'd share my thoughts.

spawnia commented 4 years ago

Technically, this can be done by constraining the SELECT to the requested fields. The fourth resolver argument ResolveInfo can be used to know which fields a client selected.

There are some pitfalls to this which require discussion:

mazedlx commented 4 years ago

Technically, this can be done by constraining the SELECT to the requested fields. The fourth resolver argument ResolveInfo can be used to know which fields a client selected.

There are some pitfalls to this which require discussion:

* Fields may not match database columns 1-1

* Laravel assumes `SELECT *` by default for things like relations, virtual properties, etc.

Would you be so awesome as to provide an example? I've searched through the docs but I can't seem to find what I am looking for :-(

Let's say I have a model Nerd with the fields glasses and beard,

type Nerd {
    glasses: Int!
    beard: String!
}

and I want to query them with

type Query {
   nerds: [Nerd] @all
}

and I only want the SQL statement to be SELECT glasses, beard FROM nerds. Thanks in advance!

spawnia commented 4 years ago
public function __invoke($root, array $args, GraphQLContext $context, ResolveInfo $resolveInfo)
{
    $resolveInfo->getFieldSelection();
    $resolveInfo->lookAhead();
    // TODO use the information from those methods to SELECT only the given fields
}
mazedlx commented 4 years ago
public function __invoke($root, array $args, GraphQLContext $context, ResolveInfo $resolveInfo)
{
    $resolveInfo->getFieldSelection();
    $resolveInfo->lookAhead();
    // TODO use the information from those methods to SELECT only the given fields
}

Thanks! Just one more stupid question: where do I put that method? On the Eloquent model?

spawnia commented 4 years ago

https://lighthouse-php.com/master/the-basics/fields.html

mazedlx commented 4 years ago

Awesome, thanks!

canatufkansu commented 4 years ago

This kind of feature would be a huge performance improvement for the package. I'm implementing an endpoint for multi million record table with at least 50+ columns. I'm using query builder to make search in a given bounding box (https://wiki.openstreetmap.org/wiki/Bounding_Box) in PostgreSQL also implemented complex where conditions to filter the results. Only issue is Laravel generates SELECT * queries which affects the performance. I could use custom resolvers but when I do that I have to deal with pagination.

Following is my current Query schema:

listings(bbox: BoundingBoxInput! @builder(method: "App\\GraphQL\\Builders\\BoundingBoxSearch@bbSearch"), where: _ @whereConditions(columnsEnum: "ListingColumn")) : [Listing!]! @middleware(checks: ["auth:api"]) @paginate(maxCount: 50 defaultCount: 10)

Is there any way I can modify select columns without writing custom resolver ? Because by this way with very little code I can do so much.

spawnia commented 4 years ago

I am not planning to implement such functionality in Lighthouse myself, but am open to accepting PRs.

canatufkansu commented 4 years ago

I am not planning to implement such functionality in Lighthouse myself, but am open to accepting PRs.

I understand the complexity and side affects like you mentioned above about relations and virtual properties. It would be nice to have some way to control select columns like in query builder @builder directive. Thanks, even like this Lighthouse is very useful.

spawnia commented 4 years ago

Just noticed this is a duplicate of https://github.com/nuwave/lighthouse/issues/1214. An implementation attempt was also started https://github.com/nuwave/lighthouse/pull/1253

andershagbard commented 4 years ago

Can't quite see how this is duplicate.

Could this be solved by implementing a select directive?

class User extends Model
{
  public function getNameAttribute()
  {
    return $this->first_name . ' ' . $this->last_name;
  }
}
type User {
  id: ID!
  first_name: String
  last_name: String
  name: String @select(fields: ["first_name", "last_name"])
}
spawnia commented 4 years ago

@andershagbard right, there is some overlap between the issue, but the intent is different.

Could this be solved by implementing a select directive?

That would be one part of the puzzle, good thinking.

We might need a mechanism to make the optimized selects opt-in per field / type. Otherwise, it could be quite hard to migrate existing applications towards it, since now Lighthouse always selects all the fields.

andershagbard commented 4 years ago

We might need a mechanism to make the optimized selects opt-in per field / type. Otherwise, it could be quite hard to migrate existing applications towards it, since now Lighthouse always selects all the fields.

Trying to think of a good solution for this.

Could this be an opt-in in the config file? If enabled it would use the field(s) in the @select, if @select is absent, use the field name.

spawnia commented 4 years ago

We might do that as an MVP. I am a bit worried about migrating big applications towards optimized selects, but maybe it is not a big issue in practice.

I do not plan to implement such a feature myself, but am happy to aid in reviewing and refining a PR.

DDynamic commented 3 years ago

@spawnia I am working on an implementation for this, and I am having trouble accessing child directives via $resolveInfo. Accessing the directives property on any selection in a selectionSet returns an empty NodeList.

spawnia commented 3 years ago

@spawnia I am working on an implementation for this, and I am having trouble accessing child directives via $resolveInfo. Accessing the directives property on any selection in a selectionSet returns an empty NodeList.

That would give you access to client directives.

This feature requires looking at schema directives. Look at other Lighthouse directives to see how that works.

DDynamic commented 3 years ago

Could you give me an example of a parent directive searching through child directives? I can't think of any off the top of my head.

spawnia commented 3 years ago

I do not know what you mean by parent or child directives. Consider the FieldValue class and its uses, maybe you will find what you need there.

Try posting a draft PR and we can discuss implementation details.

Paula2001 commented 3 years ago

I got to the point that I have created at trait that get the fields that should get selected but I am facing a serious issue with relations I can't apply the select method to belongsTo and hasOne and HasMany directives for some reason is there's any easier way to make the select rule more generic and we can apply it to all directives cause I don't see a way in here

dmouse commented 2 years ago

Hi all! other thing that I notice is about @count and @paginate:

if in the @count directive we add the columns to include in the count() method, you can choose some "indexed fields", this improve the performance

directive @count(
  """
  .....
  """

  """
  Columns
  """
  columns: [String!]
) on FIELD_DEFINITION
GRAPHQL;

something similar for @paginate might improve the performance, but in this case it's more complicated:

https://github.com/nuwave/lighthouse/blob/master/src/Pagination/PaginationArgs.php#L107

if we modify the below line to include the columns for the count process, the column parameter only affects the return columns and doesn't affect the count that execute the paginator.

return $builder->{$methodName}($this->first, ['row_id'], 'page', $this->page);

image

This is because the Laravel paginate method don't use the columns in the count execution. This is more a question for the Laravel issue queue, but maybe add another parameter in the paginate method to modify the columns in the count process

https://github.com/laravel/framework/blob/8.x/src/Illuminate/Database/Eloquent/Builder.php#L808 https://github.com/laravel/framework/blob/8.x/src/Illuminate/Database/Query/Builder.php#L2445

dmouse commented 2 years ago

Update: I opened a discussion about the paginate method https://github.com/laravel/framework/discussions/40114

chadidi commented 2 years ago

@dmouse that's just a misconception count(*) is not the same as SELECT *, check this link for more details: https://www.sqlservercentral.com/articles/advice-on-using-count

Stevemoretz commented 1 year ago
public function __invoke($root, array $args, GraphQLContext $context, ResolveInfo $resolveInfo)
{
    $resolveInfo->getFieldSelection();
    $resolveInfo->lookAhead();
    // TODO use the information from those methods to SELECT only the given fields
}

Trying to do something similar and I found that brilliant piece you written here!! Though what is the lookahead for? Any issues with using the following?

$resolveInfo->getFieldSelection(100);

It gives the result in the most beautiful output I could have imagined to get, not sure about performance issues...

spawnia commented 1 year ago
$resolveInfo->getFieldSelection(100);

That is absolutely fine, if it has the information you need. lookAhead() gives you more detailed information about the selection, such as the types of the sub-fields and their arguments.

Stevemoretz commented 1 year ago
$resolveInfo->getFieldSelection(100);

That is absolutely fine, if it has the information you need. lookAhead() gives you more detailed information about the selection, such as the types of the sub-fields and their arguments.

Thank you so much for explaining that, so now I know how to access arguments of sub-fields thanks to that method! Just two questions to make my knowledge a little more about this,

  1. can we grab info from a parent-field in the tree from a sub-node?
$_, array $args, GraphQLContext $context, ResolveInfo $resolveInfo

$_ here gives the parent result, but can't access for instance parent args or args from parent of the parent...

  1. can we pass in data from upper-fields nodes to lower ones?

Let's say that I have an array which I got based on the info I was provided in an upper-field now in a sub node (in the same branch) I want to access those data, how could I pass data from upper node to lower nodes in the branch? I thought context was there for this but I was wrong!

spawnia commented 1 year ago

I think those questions are off-topic, you may ask in StackOverflow or Slack.

spawnia commented 1 year ago

Right now, two pull requests are trying to solve this issue:

Going forward, I think the minimal viable solution for this issue is to implement the manual approach first. A marker directive needs to be added to fields to inform Lighthouse which columns need to be selected for it, e.g.:

"""
Defines which columns must be selected from the database to resolve the field.

If specified on at least one field of a type, Lighthouse will `SELECT` only the explicitly
mentioned columns matching the queried fields. For fields without this directive,
no columns will be selected, so make sure to specify this for all fields that need it.
"""
directive @select(columns: [String!]!) on FIELD_DEFINITION

type User {
  id: ID! @select(columns: ["id"])
  first_name: String! @select(columns: ["first_name"])
  last_name: String! @select(columns: ["last_name"])
  full_name: String! @select(columns: ["first_name", "last_name"])
  rights: [Right!]! @hasMany
}

Only if the fields of a type are explicitly marked with @select will Lighthouse replace the default SELECT * with a constrained SELECT, based upon the fields selected in the query.

{ users { id last_name full_name } }
# SELECT id, last_name, first_name
petecoop commented 10 months ago

@spawnia I agree on your approach on it being an entirely opt-in feature through the @select.

I've been following this for a while as I have one table that has a couple of large blob columns (not the best design but it's a legacy project), without constraining the SELECT the performance is terrible. I've had to fix this through a builder that modifies the query, but some way of doing it as you've shown would be great.

hassan-index commented 1 month ago

@spawnia I'm currently using Lighthouse GraphQL(6.42.1) in my project, and I'm facing an issue similar to this. I'm looking for a proper solution because in my case, I have tables with around 100 columns. I'm utilizing the @paginate directive for optimization, but in some cases, I don't need the @paginate directive. Is there any solution that can handle all relationship queries? I've checked this pull request, but it hasn't been merged. https://github.com/nuwave/lighthouse/pull/1626

if we can pass the selected column to the eloquent then it will decrease the API response time

query MyQuery {
  contacts {
      first_name
      last_name
  }
}

select first_name,last_name from contacts where contacts.deleted_at