Feature request: Allow user-specified queries

dkbarn commented 1 year ago

The fact that this library requires pre-defining the queries that will be exposed through the Client class seems incredibly limiting. The power of GraphQL is its flexibility -- by default, the entirety of a data model is retrievable through the edges on the root Query type. But this power and flexibility is lost when using this library, because the Client class only allows fetching the data and the fields that have been baked into a finite list of canned queries. In a sense, the end result is much more like interacting with an old-school REST API than a flexible GraphQL API.

Certain design decisions made in this library are very useful -- the use of pydantic, the auto-generation of Python dataclasses and enums from GraphQL types and enums, the support for full mypy typechecking. But what would make this library far more useful is if it exposed the full GraphQL schema, by allowing the user to fully specify the return fields of a query, rather than restricting them to a selection of pre-defined queries.

For example, given this schema:

type Query {
    users_by_id(ids: [Int!]!): [User!]!
    users_by_name(names: [String!]!): [User!]!
}

type User {
    id: Int!
    name: String
    email: String
    location: Location
    status: Status!
}

type Location {
    id: Int!
    name: String!
}

enum Status {
    ACTIVE
    INACTIVE
}

The following classes would be auto-generated:

class User(BaseModel):
    id: Optional[int]
    name: Optional[str]
    email: Optional[str]
    location: Optional[Location]
    status: Optional[Status]

class Location(BaseModel):
    id: Optional[int]
    name: Optional[str]

class Status(enum.Enum):
    ACTIVE = enum.auto()
    INACTIVE = enum.auto()

Queries could then be executed via the Client class like this, with support for fully specifying the return fields:

client = Client()
client.users_by_id(
    ids=[1234, 5678],
    return_fields=[
        User.id,
        User.name,
        User.location.name,
    ]
)

client.users_by_name(
    names=["foo", "bar", "baz"],
    return_fields=[
        User.id,
        User.status,
        User.location.id,
        User.location.name,
    ]
)

(Note: the exact syntax used to specify the return fields here would not actually work, but the point is there is some way of specifying the return fields)

The key differences are:

There is no requirement to define a queries.graphql
All edges on the root Query type are exposed by default as methods on the Client class.
When calling these query methods on the Client class, the return fields must be specified.
The GraphQL types are 1-to-1 mapped directly to pydantic classes. You do not end up with different classes for the same underlying type, for example you don't have UsersByIDsUser for the return type of one query and UsersByNameUser for the return type of another query. You simply have User as the return type of both, and the returned User objects are sparsely-populated based on the return fields requested. All fields on the generated pydantic classes are defined as Optional to allow for these sparsely-populated objects.

rafalp commented 1 year ago

The fact that this library requires pre-defining the queries that will be exposed through the Client class seems incredibly limiting.

Is this a practical observation from a project you've build with the Codegen?

It's true that Ariadne Codegen limits your client only to queries you have pre-defined for it, but its a trade off we've did based on prior experience in consuming GraphQL APIs from Python services, where most often than not following were true:

Our clients performed limited number of GraphQL operations.
Regenerating clients as new queries are included was friction-less.
Having single strict type is preferable when writing business logic vs having flexible types in UI logic that GraphQL was originally created for.

You can call this approach reductio at RESTum but observation that gains from GraphQL's flexibility are limited when using it in service 2 service communication are nothing new. I've did first multi-service project that used GraphQL before Ariadne even existed (in first half of 2018) and we've quickly ended with set of functions that did few pre-defined queries because that was actually better developer experience than specifying the fields every time I've wanted to pull the data from other API.

This is sort of thing you see with ORM's. You can pull a subset of columns from a table, but most of time everything is better. We could probably support generic queries like this (an escape hatch like how it is in ORMs) but I am worried this is a lot of extra work that developers would eventually walk away from as they discover that their business logic is easier to write with a finite and limited number of models.

dkbarn commented 1 year ago

Is this a practical observation from a project you've build with the Codegen?

We haven't actually built anything with Codegen yet, we are evaluating it as a possible solution, alongside alternatives like https://github.com/graphql-python/gql and https://github.com/profusion/sgqlc

However, our particular scenario is that we have a backend server utilizing GraphQL, and we have many different users accessing this API, for a variety of different purposes (populating UI's, generating reports, fetching data to sync into other systems). We don't have control over the specific details of the queries they could be executing.

Our clients performed limited number of GraphQL operations. Regenerating clients as new queries are included was friction-less.

The only use case in which I could envision a friction-less workflow for responding to changing query needs is the scenario where the backend developers and frontend developers are the same people -- that is, the developers using the Client library and the developers building the GraphQL schema are either one team or collaborate extremely closely.

So maybe this works in service-to-service communication, like you describe. But if you are simply wanting to provide a Pythonic client library that wraps the HTTP mechanics of communicating with a GraphQL server in a generic and extensible way, ariadne-codegen seems inadequate. The design changes I was suggesting were aimed at making this library cater to a wider range of scenarios.

In the end, we may end up having to go with a library like https://github.com/graphql-python/gql since this does allow dynamic construction of queries, but we will be sacrificing the ability to have full mypy typechecking, which I would consider to be a great loss.

heyaphra commented 1 year ago

@dkbarn In our experience using ariadne-codegen so far, we've incorporated parts the out-of-the-box AsyncClient into our own GraphqlClient code. Dynamic query composition as featured in python-graphql is also an attractive feature to us, so we're trying to strike a balance between the convenience of code generation and client extensibility by specifying base_client_name and base_client_path in our config TOML for ariadne-codegen to use our client instead of the default AsyncClient, which gives us the ability to incorporate features from python-graphql down the road. What we do really like about the default AsyncClient is that it is compatible with any requests-style API.

rafalp commented 1 year ago

API proposal:

result = await client.query(
  Query.user(id="123").fields(
    User.id,
    User.email,
    User.group.fields(UserGroup.id),
    User.reviews(category="123"),
  ),
  Query.user(id="fsadsa").alias("wohoo").fields(
    User.id,
    User.email,
    User.group.fields(UserGroup.id),
    User.reviews(category="123"),
  ),
  Query.search(query="test").on(
    "User",
    User.id,
    User.email,
  ).on(
    "Thread",
    Thread.id,
    Thread.title,
  ),
  operation_name="HelloWorld"
)

Query, User, UserGroup or Thread types would be generated by the codegen. We could have this feature behind feature flag, so if you want to generate those types and custom query support, you would have to opt in to it first.

Just a caveat tho: those generated types would be used only as safety net for query building. I don't think we would be able to generate types for client.query return values without having Mypy plugin, which is different discussion IMHO.

dkbarn commented 1 year ago

Yes this looks like what we're after! Regarding your caveat about return types, would the type annotation have to be "Any" then?

Also, I'm not sure I completely understand this bit:

Query.search(query="test").on(
    "User",
    User.id,
    User.email,
  ).on(
    "Thread",
    Thread.id,
    Thread.title,
  )

What is "search" and "on" in this context? What does that translate to in GraphQL?

rafalp commented 1 year ago

Regarding your caveat about return types, would the type annotation have to be "Any" then?

I think so, but I am not familiar with what Mypy plugins would make possible.

Above would produce the following GraphQL query:


  query HelloWorld {
    user(id: "123") {
      id
      email
      group {
        id
      }
      reviews(category: "123")
    }
    wohoo: user(id: "123") {
      id
      email
      group {
        id
      }
      reviews(category: "123")
    }
    search(query: "test") {
      .. on User {
        id
        email
      }
      .. on Thread {
        id
        title
      }
    }
  }

dkbarn commented 1 year ago

In your example code above, would the User class used in the query builder be the same User class returned in the response, or would you need to define 2 different User classes to represent these concepts?

Assuming that the User class used in the query builder is represented as a Pydantic class to maintain consistency with the rest of ariadne-codegen, I'm unclear on how you would achieve the ability to define a fields method on any given attribute, or even to reference these class-level attributes?

class User(pydantic.BaseModel):
    id: int
    email: str

Query.user(id="123").fields(
    User.id,
    User.email,
)
# this would throw
# AttributeError:  type object 'User' has no attribute 'id'

rafalp commented 1 year ago

I doubt those ORM classes would be pydantic models, or that they would return anything else than lists of dicts.

Just a caveat tho: those generated types would be used only as safety net for query building. I don't think we would be able to generate types for client.query return values without having Mypy plugin, which is different discussion IMHO.

Even if we went with classes, those would be separate TypeData models where every field is nullable, eg:

class UserData(ariadne_codegen.orm):
  id: str | None = None
  username: str | None = None
  group: UserGroupData | None = None

rafalp commented 1 year ago

...but seeing how we already require Pydantic, I don't think there would be an issue with having those data types as it's models instances as you've originally proposed.

My only worry here is that the code will have to be littered with if result.some_type.some_field checks to keep the Mypy happy. 🤔

nevoodoo commented 6 months ago

Bumping this as we use Strawberry for covering our graphql needs but we are trying to move away from OpenAPI Generator and want to use something that can generate async graphql clients instead. We've hit a roadblock here as we have a lot of queries that we couldn't possibly manually define into a queries.graphql file and so any way to extract this from the schema definition sounds like a life saver :D

DamianCzajkowski commented 6 months ago

@nevoodoo I'm working on a solution and will have it ready soon.

dkbarn commented 5 months ago

@DamianCzajkowski we're eager to see what sort of solution you've come up with! Roughly when do you think there might be a pull request to look at?

DamianCzajkowski commented 5 months ago

@dkbarn For now it looks like this:

    async with SaleorClient(saleor_url=saleor_url, api_key=None) as saleor_client:
        query_str = [
            Query.products(first=1, channel="default-channel").fields(
                ProductCountableConnectionFields.edges().fields(
                    ProductCountableEdgeFields.node().fields(
                        ProductFields.id,
                        ProductFields.slug,
                    ),
                    ProductCountableEdgeFields.cursor,
                ),
            ),
            Query.products(first=10, channel="default-channel")
            .alias("my_products")
            .fields(
                ProductCountableConnectionFields.edges().fields(
                    ProductCountableEdgeFields.node().fields(
                        ProductFields.id,
                    )
                ),
            ),
        ]
        result = await saleor_client.query(
            *query_str,
            operation_name="getProducts",
        )

Right now I'm struggling with response types and typing on them. I'm testing my solution on really complicated schema from Saleor and I see performance issues with Pydantic models, import takes about 1.8 seconds for 81 models in one file (they are all dependent of themselves). I'm thinking of using dataclasses (0.057s import) but it doesn't have fields validation. I was thinking to use manually generated field validation (as simple as possible). What do you think guys? what will be the best solution for you right now, to have typed responses? or should I abandon this idea in first version of query builder?

DamianCzajkowski commented 5 months ago

Hi, I've published experimental version of custom query builder in ariadne-codegen. It is a pre-release from dev branch. https://github.com/mirumee/ariadne-codegen/releases/tag/0.14.0.dev1 This code needs some refactor and code optimization. I will be cleaning this code in the future but I need to take a break from it for a while. It will be perfect if you guys test it and provide me some feedback about this feature so I can improve it. Thanks!

jukiewiczm commented 5 months ago

@DamianCzajkowski

I'd like to do some testing of this. I installed the dev version but the client I generated looks the same as the one generated with stable version of ariadne-codegen. I initially thought all these Query, ProductCountableConnectionFields, etc. classes will be generated from the schema. Is that not the case?

It also seems all of the stuff is in the tests package. Could you provide some initial info on how to test it with my own schema? Do I need to run the codegen differently? Or is it just a functionality for now that is not (or perhaps will never be?) auto generated?

DamianCzajkowski commented 5 months ago

@jukiewiczm Hi, I should, leave instructions in the README.md file. I'll do it later but if you look into the tests: https://github.com/mirumee/ariadne-codegen/blob/dev/tests/main/clients/custom_query_builder/pyproject.toml in pyproject.toml there is a flag that you need to initialize: enable_custom_operations = true This will activate the custom operations generation in your project.

jukiewiczm commented 5 months ago

@DamianCzajkowski

I've done some testing and here's my input. First of all, great piece of work! I actually believe this feature will actually become the most exciting one in the project.

Doesn't work with "include_all_inputs": False/"include_all_enums": False - to be adjusted or made mutually exclusive with "enable_custom_operations": True
Doesn't work with "ariadne_codegen.contrib.no_reimports.NoReimportsPlugin" - as above
custom_queries.py seems to be missing enums import, I've temporarily fixed it with from .enums import *
custom_queries.py missing custom scalars import, in my case that required a from uuid import UUID import
For me, the Client takes forever to import even on the current stable branch with "include_all_inputs": True/"include_all_enums": True (dozens of models), so I guess the problem is more general rather than related to this particular functionality (though I'm not sure). I guess some work with NoReimportsPlugin/ClientForwardRefsPlugin could help with that.
I had to make the following adjustments to GraphQLArgument._convert_value so it works with my schema. It's mostly about nested pydantic models (model_dump won't work in such scenario, you need a level-by-level decomposition while model_dumps converts everything to dictionary all at once) and custom scalars. The query looks more or less like this:
```
client.query(Query.items(where=ItemsExp(id=ComparisonExp(eq=150))).fields(ItemsFields.id, ItemsFields.name), operation_name="op")
```
That's the schema I'm given, so I refrained from assessing whether this is a bad schema example, I guess stuff like this will appear and should work though.

Here are the adjustments

def _convert_value(
    self, value: Any
) -> Union[StringValueNode, IntValueNode, FloatValueNode, BooleanValueNode, ObjectValueNode]:
    if isinstance(value, str):
        return StringValueNode(value=value)
    if isinstance(value, int):
        return IntValueNode(value=str(value))
    if isinstance(value, float):
        return FloatValueNode(value=str(value))
    if isinstance(value, bool):
        return BooleanValueNode(value=value)
    if isinstance(value, BaseModel):
        fields = [
            ObjectFieldNode(name=NameNode(value=field_info.alias or field_name), value=self._convert_value(field_value))
            for field_name, field_info in value.model_fields.items()
            if (field_value := getattr(value, field_name)) is not None
        ]
        return ObjectValueNode(fields=fields)
    if isinstance(value, UUID): # Guess we'd need something more sophisticated for all custom scalar types
        return StringValueNode(value=str(value))
    raise TypeError(f"Unsupported argument type: {type(value)}")

Regarding response types you mentioned in the other comment, do you mean the type for client.query? I was wondering how did you approach that? Selected fields will be dynamic so obviously not all of them will always be available. Pydantic models with all the fields being optional would be a way to go. Kind of a weird one, as you wouldn't be able to tell whether the field is null or was just not selected, but I guess it is an option.

For the time being, I guess something like that could work (simplified example):

# in the client class file
from typing import TypeVar
from pydantic import BaseModel

# in the client class
async def structured_query(
    self, *fields: "GraphQLField", operation_name: str, output_type: type[OUTPUT_TYPE]
) -> OUTPUT_TYPE:
    return output_type.parse_obj(await self.query(*fields, operation_name=operation_name))

and used like this

from pydantic import BaseModel

# These you define on your own, I can't really imagine a way around it besides the wierd solution I mentioned above
class Item(BaseModel):
    id: int
    name: str

class Items(BaseModel):
    items: list[Item]

client.query(Query.items(where=ItemsExp(id=ComparisonExp(eq=150))).fields(ItemsFields.id, ItemsFields.name), operation_name="op", output_type=Items)

DamianCzajkowski commented 5 months ago

Hi @jukiewiczm, Thanks for Your feedback! Can you provide me somehow schema file that you are working on? It will be helpful on fixing those issues. Currently I was testing this all on Saleor schema.

jukiewiczm commented 5 months ago

Unfortunately I can't do that (due to obvious reasons - work stuff). Over the weekend I'll try to cut out and anonymize part of it and share it with you. That should be enough to cover all these use cases besides the slow loading, which you're experiencing yourself so I guess it should be enough :)

Optionally, I'll try to generate stuff that doesn't work using the Saleor schema, looking through it, it might also be possible.

jukiewiczm commented 5 months ago

@DamianCzajkowski I forked the repo and used the Saleor schema to reproduce stuff I mentioned. Please take a look at stuff in manual-operations-examples branch.

Here's the folder that contains it all.

You can review the commits I added to see the progress of fixing stuff for query in this file that initially did not work.

It doesn't reproduce:

Slow loading, as this is a general problem rather than this new feature's problem. As mentioned, even without this feature, my schema loads terribly slowly (the Saleor one without the output classes loads pretty fast actually). So I'd say it's kind of unrelated.
Not working with certain flags, as it's kind of obvious + you can test that yourself by changing the flags I mentioned above.

Hope this helps!

DamianCzajkowski commented 4 months ago

Hi guys, sorry there was no activity in this issue. But now I committed some code changes that should fix lack of imports in files. I also changed the way we build the query so it's integrated with main ariadne-codegen code in Client.execute(). I want to take care on slow loading the package on import (this is the pydantic issue) and with really big schema it will take some time, but I believe that if i take care of init.py file and reduce the amount of imports there the package itself will be quicker.

Note: If you want to use custom scalars with ariadne-codegen there is a feature that handles this but it requires manual work example

Future improvements:

I need to take this code and probably rewrite some part cause of mess that I created to fit the requiremets.

mirumee / ariadne-codegen

Feature request: Allow user-specified queries #132