Closed somada141 closed 6 years ago
I think right now, you have to inspect the query AST(info.query
) to see what fields are queried. Then you can use the load_only(*columns)
option in the sqlalchemy query.
fields = get_query_fields(info.query, info.fragments)
query = TypeAuthor.get_query(info=info).options(load_only(*fields))
or something similar. See https://github.com/graphql-python/graphene/issues/348 for ideas about how to implement get_query_fields
and alternatives.
Of course this is a simplification, since this only works for the first level of the query. You'd actually have to also look at relationships, recursively, and apply the load_only
to those too. It might look like this:
fields = get_query_fields(info.query, info.fragments)
query = TypeAuthor.get_query(info=info)\
.options(*(
joinedload(relation).load_only(*relation_fields)
if relation is not None else load_only(*fields) # for top level entity
for relation, relation_fields in fields
))
but this still isn't right, it only takes care of the first level of relationships.
The joinedload
option specifies an eager loading method, which we also want to avoid the n+1 problem(see also https://yacine.org/2017/02/27/graphqlgraphene-sqlalchemy-and-the-n1-problem/).
http://docs.sqlalchemy.org/en/latest/orm/loading_relationships.html http://docs.sqlalchemy.org/en/latest/orm/loading_columns.html
Thanks a lot @DrPyser that was a very helpful response! I will give it a go now and update the issue.
Hey @DrPyser. I tried the implementation of get_query_fields
under https://github.com/graphql-python/graphene/issues/348#issuecomment-267717809 but the field-names are 'mapped' to GraphQLised version of themselves.
For example, the return of the aforementioned implementation against this query:
query GetAuthor{
author(authorId: 1) {
nameFirst
}
}
will return ['author', 'author.nameFirst']
. While its fairly easy to remove the author
prefix (as the table fields won't resolve with that in there), it gets a little trickier reverting camel-case field-names to their original underscored versions, e.g., nameFirst
to name_first
.
Is there a more concrete way of performing this mapping? Graphene/GrapheneSQLAlchemy does it somehow when mapping ORM fields to GraphQL fields so I was hoping to leverage that somehow.
Yup, look in graphene.utils.str_converters
. There's a function to_snake_case
that converts camel case to snake case(and to_camel_case
to do the opposite).
That's beautiful, thanks a lot @DrPyser.
Should anyone land here wondering about implementation I got around to implementing my own version of the get_query_fields
function as such:
def extract_requested_fields(
info: graphql.execution.base.ResolveInfo,
fields: List[Union[Field, FragmentSpread]],
do_convert_to_snake_case: bool = True,
) -> Dict:
"""Extracts the fields requested in a GraphQL query by processing the AST
and returns a nested dictionary representing the requested fields.
Note:
This function should support arbitrarily nested field structures
including fragments.
Example:
Consider the following query passed to a resolver and running this
function with the `ResolveInfo` object passed to the resolver.
>>> query = "query getAuthor{author(authorId: 1){nameFirst, nameLast}}"
>>> extract_requested_fields(info, info.field_asts, True)
{'author': {'name_first': None, 'name_last': None}}
Args:
info (graphql.execution.base.ResolveInfo): The GraphQL query info passed
to the resolver function.
fields (List[Union[Field, FragmentSpread]]): The list of `Field` or
`FragmentSpread` objects parsed out of the GraphQL query and stored
in the AST.
do_convert_to_snake_case (bool): Whether to convert the fields as they
appear in the GraphQL query (typically in camel-case) back to
snake-case (which is how they typically appear in ORM classes).
Returns:
Dict: The nested dictionary containing all the requested fields.
"""
result = {}
for field in fields:
# Set the `key` as the field name.
key = field.name.value
# Convert the key from camel-case to snake-case (if required).
if do_convert_to_snake_case:
key = to_snake_case(name=key)
# Initialize `val` to `None`. Fields without nested-fields under them
# will have a dictionary value of `None`.
val = None
# If the field is of type `Field` then extract the nested fields under
# the `selection_set` (if defined). These nested fields will be
# extracted recursively and placed in a dictionary under the field
# name in the `result` dictionary.
if isinstance(field, Field):
if (
hasattr(field, "selection_set") and
field.selection_set is not None
):
# Extract field names out of the field selections.
val = extract_requested_fields(
info=info,
fields=field.selection_set.selections,
)
result[key] = val
# If the field is of type `FragmentSpread` then retrieve the fragment
# from `info.fragments` and recursively extract the nested fields but
# as we don't want the name of the fragment appearing in the result
# dictionary (since it does not match anything in the ORM classes) the
# result will simply be result of the extraction.
elif isinstance(field, FragmentSpread):
# Retrieve referened fragment.
fragment = info.fragments[field.name.value]
# Extract field names out of the fragment selections.
val = extract_requested_fields(
info=info,
fields=fragment.selection_set.selections,
)
result = val
return result
which parses the AST into a dict
preserving the structure of the query and (hopefully) matching the structure of the ORM.
Running the info
object of a query like:
query getAuthor{
author(authorId: 1) {
nameFirst,
nameLast
}
}
produces
{'author': {'name_first': None, 'name_last': None}}
while a more complex query like this:
query getAuthor{
author(nameFirst: "Brandon") {
...authorFields
books {
...bookFields
}
}
}
fragment authorFields on TypeAuthor {
nameFirst,
nameLast
}
fragment bookFields on TypeBook {
title,
year
}
produces:
{'author': {'books': {'title': None, 'year': None},
'name_first': None,
'name_last': None}}
Now these dictionaries can be used to define what is a field on the primary-table (Author
in this case) as they'll have a value of None
such as name_first
or a field on a relationship of that primary-table such as field title
on the books
relationship.
A simplistic approach to auto-applying those fields can take the form of the following function:
def apply_requested_fields(
info: graphql.execution.base.ResolveInfo,
query: sqlalchemy.orm.Query,
orm_class: Type[OrmBaseMixin]
) -> sqlalchemy.orm.Query:
"""Updates the SQLAlchemy Query object by limiting the loaded fields of the
table and its relationship to the ones explicitly requested in the GraphQL
query.
Note:
This function is fairly simplistic in that it assumes that (1) the
SQLAlchemy query only selects a single ORM class/table and that (2)
relationship fields are only one level deep, i.e., that requestd fields
are either table fields or fields of the table relationship, e.g., it
does not support fields of relationship relationships.
Args:
info (graphql.execution.base.ResolveInfo): The GraphQL query info passed
to the resolver function.
query (sqlalchemy.orm.Query): The SQLAlchemy Query object to be updated.
orm_class (Type[OrmBaseMixin]): The ORM class of the selected table.
Returns:
sqlalchemy.orm.Query: The updated SQLAlchemy Query object.
"""
# Extract the fields requested in the GraphQL query.
fields = extract_requested_fields(
info=info,
fields=info.field_asts,
do_convert_to_snake_case=True,
)
# We assume that the top level of the `fields` dictionary only contains a
# single key referring to the GraphQL resource being resolved.
tl_key = list(fields.keys())[0]
# We assume that any keys that have a value of `None` (as opposed to
# dictionaries) are fields of the primary table.
table_fields = [
key for key, val in fields[tl_key].items()
if val is None
]
# We assume that any keys that have a value being a dictionary are
# relationship attributes on the primary table with the keys in the
# dictionary being fields on that relationship. Thus we create a list of
# `[relatioship_name, relationship_fields]` lists to be used in the
# `joinedload` definitions.
relationship_fieldsets = [
[key, val.keys()]
for key, val in fields[tl_key].items()
if isinstance(val, dict)
]
# Assemble a list of `joinedload` definitions on the defined relationship
# attribute name and the requested fields on that relationship.
options_joinedloads = []
for relationship_fieldset in relationship_fieldsets:
relationship = relationship_fieldset[0]
rel_fields = relationship_fieldset[1]
options_joinedloads.append(
sqlalchemy.orm.joinedload(
getattr(orm_class, relationship)
).load_only(*rel_fields)
)
# Update the SQLAlchemy query by limiting the loaded fields on the primary
# table as well as by including the `joinedload` definitions.
query = query.options(
sqlalchemy.orm.load_only(*table_fields),
*options_joinedloads
)
return query
Is this still the only way to do it? even with the new version out?
@aonamrata I haven't seen any new approach being available. Which new version are you referring to?
never mind, it does not have it. I saw couple of PRs to add it and was assuming it should be in the next version .. after 2.0.
I know this is closed, but should the code snippet @somada141 provided be merged into this library? Assuming it gets thoroughly tested, I can't think of a reason why we wouldn't want to fix the overfetching of data from the DB and fix the N+1 problem.
I believe this is the missing piece that would make using SQLAlchemy + graphene + graphene-sqlalchemy a truly beautiful thing.
Note this would probably solve #100 and #109, which mention overfetching and N+1, respectively.
Yeah, this is a fairly complex problem and solution, you don't want everyone to implement their own version. This should also be a basic concern of a library like this one. It might help if the graphene or graphql-core libraries provided more utilities to manipulate the AST.
Yeah I agree that something like that should be merged in to this library at some point. Ideally we should write some sort of design document before we delve into the actual implementation because there are many considerations and edge cases to take into account. For example, I think the N+1 issue is somewhat orthogonal to this problem and could be solved with different approaches.
I'm currently working on improving the out-of-the-box performance of graphene-sqlalchemy
. I'll probably get to this at some point.
Hi, just a heads up that I sent a PR to address the N+1 problem. I ended up following a different approach than @somada141 because one of my goals was that this optimization should work well with schemas that include non-graphene-sqlalchemy
types. I would love to hear your thoughts. Cheers.
Hi, I didn't find full answer, so I combined answers and made solution for "n+1", but "cartesian product" appears.
extract_requested_fields
function from @somada141, that converts requested fields to dictmakeLoadOnlyOptions
function, that converts dict to query optionsLoadOnlyConnectionField
- if you don't want custom class, you can use this example to resolve query manuallydef extract_requested_fields(
info: graphql.execution.base.ResolveInfo,
fields: List[Field],
do_convert_to_snake_case: bool = True,
) -> Dict:
result = {}
for field in fields:
key = field.name.value
if do_convert_to_snake_case:
key = to_snake_case(name=key)
if key == "id":
continue
val = None
if isinstance(field, Field):
if (
hasattr(field, "selection_set") and
field.selection_set is not None
):
val = extract_requested_fields(
info=info,
fields=field.selection_set.selections,
do_convert_to_snake_case=do_convert_to_snake_case
)
if val and 'edges' in val:
val = val['edges']
if val and 'node' in val:
val = val['node']
result[key] = val
elif isinstance(field, FragmentSpread):
fragment = info.fragments[field.name.value]
val = extract_requested_fields(
info=info,
fields=fragment.selection_set.selections,
do_convert_to_snake_case=do_convert_to_snake_case
)
result = val
return result
def makeLoadOnlyOptions(fieldsDict, model):
if not fieldsDict:
return []
options = []
modelFields = []
relationshipFields = []
for field in list(fieldsDict):
if fieldsDict[field] is None:
modelFields.append(field)
else:
relationshipFields.append(field)
if modelFields:
options.append(load_only(*modelFields))
for relationshipField in relationshipFields:
relationshipOptions = makeLoadOnlyOptions(fieldsDict[relationshipField],
getattr(model, relationshipField).property.mapper.class_)
options.append(joinedload(getattr(model, relationshipField)).options(*relationshipOptions))
return options
class LoadOnlyConnectionField(graphene_sqlalchemy.SQLAlchemyConnectionField):
def __init__(self, connection, *args, **kwargs):
super().__init__(connection, *args, **kwargs)
@classmethod
def get_query(cls, model, info: 'ResolveInfo', sort=None, **args):
query = super().get_query(model, info, sort, **args)
queryName = info.field_name
fieldsDict = extract_requested_fields(info, info.field_asts, False)
fieldsDict = fieldsDict[queryName]
options = makeLoadOnlyOptions(fieldsDict, model)
query = query.options(*options)
# print(query)
return query
class Query(graphene.ObjectType):
node = relay.Node.Field()
searchTitlesRelay = LoadOnlyConnectionField(connection=Title, sort=Title.sort_argument())
{
searchTitlesRelay{
edges {
cursor
node {
uuid
code
titleAuthors {
edges {
node {
uuid
user {
uuid
name
}
}
}
}
chapters {
edges {
node {
uuid
}
}
}
}
}
}
}
SELECT anon_1.title_id AS anon_1_title_id, anon_1.title_code AS anon_1_title_code,
title_author_1.id AS title_author_1_id,
users_1.id AS users_1_id, users_1.name AS users_1_name,
chapter_1.id AS chapter_1_id
FROM (
SELECT title.id AS title_id, title.code AS title_code
FROM title ORDER BY title.id ASC
LIMIT 2
) AS anon_1
LEFT OUTER JOIN title_author AS title_author_1 ON anon_1.title_id = title_author_1.title_id
LEFT OUTER JOIN users AS users_1 ON users_1.id = title_author_1.user_id
LEFT OUTER JOIN chapter AS chapter_1 ON anon_1.title_id = chapter_1.title_id
ORDER BY anon_1.title_id ASC
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related topics referencing this issue.
Consider the following SQLAlchemy ORM class:
Simply wrapped in an
SQLAlchemyObjectType
as such:and exposed through:
A GraphQL query such as:
will cause the following raw SQL to be emitted (taken from the echo logs of the SQLA engine):
As one can see we may only want the
nameFirst
field, i.e., thename_first
column but the entire row is fetched. Of course the GraphQL response only contains the requested fields, i.e.,but we have still fetched the entire row, which becomes a major issue when dealing with wide tables.
Is there a way to automagically communicate which columns are needed to SQLAlchemy so as preclude this form of over-fetching?