Generate Elm modules from a schema

jamesmacaulay commented 7 years ago

Develop concurrently with #18.

The package should be able to power a command-line tool to generate Elm modules from a schema.json file. Each module should provide helper functions for a single type in the schema.

Here's a sketch of the kind of thing you should be able to do with them (generated modules are Schema.User and Schema.Project below):

import GraphQL.Request.Builder exposing (..)
import Schema.User as User
import Schema.Project as Project

type alias UserSummary =
    { id : String
    , email : String
    , projects : List Project
    }

userSummary =
    object UserSummary
        |> with User.id
        |> with User.email
        |> with
            (User.projects
                (extract Project.name)
            )

patrickdet commented 7 years ago

I'd love to get started on this. Currently still researching how other projects handle this. The Elm community has a package to parse Elm files to an AST or construct an AST by hand (http://package.elm-lang.org/packages/Bogdanp/elm-ast/latest) but there is not AST -> String function yet. Once we have the AST -> String we could construct some modules from the graphql schema. I'd rather not do this through some wild String concatenation which some projects also resort to.

jamesmacaulay commented 7 years ago

@patrickdet I was looking at that package too, it looks like the right thing to use. Two nice things about this:

All the constructors of the AST are publicly exposed from the package, so we can write the AST -> String function without having to fork the package.
The AST -> String function does not need to be written before the AST-generating code; they are independent and can be developed in parallel.

If you want to get started on this, please do! And please do share your work-in-progress along the way. There is already code in elm-graphql to decode a schema introspection response (which, by the way, we could modify the representation of if needed because it isn't exposed publicly). It should be pretty straightforward to get started on some function that takes that schema representation and returns a list of modules as a List (List Ast.Statement) or something.

Happy to chat on elm-slack either synchronously or asynchronously if we aren't online at the same time!

jamesmacaulay commented 7 years ago

Thinking about it a bit more, Dict String (List Ast.Statement) would be better for the return value, with each module's AST indexed by filename.

patrickdet commented 7 years ago

yeah i will get on that then. probably monday

Vynlar commented 7 years ago

I've had a chat with @jamesmacaulay regarding this issue. We've broken it down into a set of things that need to get done and some high-level ideas. Steps:

Design an easy to use DSL for writing queries (in elm) for a given schema. See top of this issue for an example.
Parse the schema (fetched from a graphql server's introspection endpoint?) and transform it into an in memory AST
Transform this AST into runnable elm code
- We fear that creating a proper AST -> code converter that could accept arbitrary AST structures would be effort wasted and therefore, elect to use a more tailored string interpolation approach, given that the code we are generating is going to be quite uniform.
- The elm that is generated from this transformation process could then be used to write custom queries/mutations in elm directly (again, see example at the top of this issue)
Next, parse arbitrary *.graphql files and transform them into elm code that uses the DSL generated in step 3

Benefits:

Users will have the freedom of generating the schema DSL modules and writing their queries in elm to maintain the elm purity, or alternately, writing the queries in graphql directly and still maintaining the elm guarantees.
Our elm code's types are guaranteed to match that of the server's schema
It significantly cuts down on boilierplate code needed to write queries
The creation of an AST allows for runtime query manipulation that you cannot do with pure string interpolation

@jamesmacaulay has done some experimentation into this and put it into a branch: https://github.com/jamesmacaulay/elm-graphql/compare/code-generation-experimental

sporto commented 6 years ago

Hi, I don't quite understand how this will work.

You talk about generating types from a schema. For example my schema may say that Project has name and status. This will generate something like

type alias Project = 
    { name: String
    , status: String
    }

Correct?

But what if I'm not interested in the status? Say I write a query like:

project {
    name
}

Then this type alias in invalid as it cannot be fulfilled. Seems to me that any code generation needs to:

Take a graphql query and validated it with the schema
Then generate Elm code from the query mixed with schema. So if I have different queries that use project, I would end up with multiple type aliases with only the fields that each use.

Care to explain more? Thanks

Vynlar commented 6 years ago

@sporto I haven't figured out the types in detail yet, but essentially graphql's type system is weaker than Elm's is, so we have to map between the two somewhere. In the existing library, we do it when defining the extract method. This is what defines our encoders/decoders and therefore, how we go from Elm records to graphql queries and vice versa. Each of one of these queries is given a name as it stands, so we will just continue that approach.

As far as generating the types from our graphql queries, I think a reasonable way of doing it would be to use the file name as the name for the query in Elm. I'll give an example:

userPermissions.graphql

query {
  user {
    isAdmin,
    canPost
  }
}

userInfo.graphql

query {
  user {
    id, name, email
  }
}

These two queries cannot map to the same Elm record (unless we jammed a million Maybes in there), therefore, we need a unique record for each query:

type alias UserPermissions =
  { isAdmin: Bool
  , canPost: Bool
  }

type alias UserInfo =
  { id: String
  , name: String
  , email: String
  }

Of course, when you get the response, you can map these structures into your model however you'd like; For example into a single User type or whatever.

EDIT: The idea is to have multiple usable layers:

You can use the library as it is now, where you write your queries and their extract definitions yourself. The disadvantage of this version is you have no guarantees as to whether your Elm schema matches the server's schema at all.
You can generate an Elm module that handles part of the hassle for you, but you still define your queries in elm. In this version, you don't see the details of the query encoding/decoding, but you still have to do some grunt work in Elm.
You can do step 2 and also generate the code that uses the previously generated module to make the queries. In this version you never have to see anything except your app, and graphql. The largest advantage of this option is that you can use Graphiql to write queries, then "compile" them into elm directly. No manual translation needed.

sporto commented 6 years ago

@Vynlar that makes sense, thanks for the explanation

Alternatively these could generate different Elm modules e.g. UserPermissionsGraph.elm and UserInfoGraph.elm. These would contain the generated queries too.

frankdugan3 commented 6 years ago

@Vynlar Another way to get names for the types is to actually use named queries. Other libraries, such as Apollo, require this.

query UserPermissions {
  user {
    isAdmin
    canPost
  }
}

On the topic of validating the query against the schema, in my setup I simply use a few tools combined as an NPM script to download the remote schema, convert it to a JSON AST, and use a few editor plugins for auto-complete and real-time validation with ESLint. Maybe not ideal, but it guarantees safety for now.

jamesmacaulay commented 6 years ago

As @Vynlar clarified (thanks!) my intention is to have these three different ways of using the library: (1) no code-gen, (2) code-gen of helper modules from a GraphQL schema, and (3) code-gen of modules from GraphQL queries.

This issue and the little sketch of code in its description are really about option 2. When generating helper modules from a schema, we don’t know anything about the queries that are going to be used, so we can’t generate any record type aliases that could be used for decoding objects in query responses.

For option 3, generating modules from GraphQL queries, my intention is to have each query result in a single independent Elm module. Keeping everything in separate modules makes naming things a whole lot easier, since each query has its own namespace.

I think what I need to do next for this issue is come up with a small example schema and query and fully write out the Elm modules that would result from generating code via option 2 and 3 respectively.

Vynlar commented 6 years ago

@jamesmacaulay I've taken a first stab at creating some examples of what I hope we can eventually generate. I'm happy for people to modify it as there are still some potential concerns. Let me know what you all think.

https://github.com/Vynlar/elm-graphql-code-gen

jamesmacaulay commented 6 years ago

@Vynlar thanks for this :) This is mostly what I had in mind, and provides a great starting point for more detailed discussion. Here are some notes:

The query and args functions in the Schema.User aren't really about the User type, they are about the root query type. There should be a separate module for that type (Query or QueryRoot or whatever it's called in the schema) just like any other type. Also, generating convenience functions like this might be nice as an optional added layer of code generation (probably one that's specific to Relay or some other set of conventions) but I don't think this kind of thing should be in the core library, since it kind of relies on the schema having a particular structure.
The biggest challenges I see are in the field arguments. I'd like the generated API to provide a nicer, more type-safe and constrained interface for arguments than the free-form interface currently provided by GraphQL.Request.Builder.Arg. However, a bunch of things make this difficult:

Each field might have an entirely unique set of arguments.
It is common for most of a field's arguments to be optional/nullable.
GraphQL makes a semantic distinction between leaving an argument out of an argument list and passing null as its value: http://facebook.github.io/graphql/October2016/#sec-Null-Value.
We still need to allow a variable to be nested anywhere in a field's arguments.

The "simplest thing" would be to represent each field's arguments as its own record type alias. However, this would mean that:

the user of the API needs to pass Nothing explicitly for every optional argument, which there might be many of.
the type of each optional/nullable field would either have to be a double-wrapped Maybe (Maybe a) in order to represent the distinction between missing and explicitly-null values, or we make a new union type to represent the distinction more clearly, or we ignore the distinction and then the generated code is limited in functionality to never be able to pass null values explicitly.
the fact that variables can be used for every value means that every field type would need to be wrapped in some union type that allows either a fixed or variable value:

type ArgValue a vars
    = Fixed a
    | Variable (GraphQL.Request.Builder.Variable.Variable vars)

Also note that the GraphQL.Request.Builder.Variable.Variable type is not parameterized in the right way – this approach would allow any variable to be used for any argument value, when the whole point of this code generation is to use the type system to prevent invalid things like that in the queries that are built with the generated API. So either the existing variable system needs to be reworked to provide the right type parameters in the right places, or a new variable system needs to be brought in just to work with generated modules (ugh).

Some Builder types should get an extra parameter. This will allow the generated API to guarantee that you can only ever provide the correct kind of object spec to a given object-typed field (e.g. the role field of the User type in your example).

jamesmacaulay commented 6 years ago

Also about field arguments: I'm currently playing around with an alternative to records that would allow you to only specify the arguments you actually want to pass in. The tradeoff is that it makes the type signatures of the API more complicated and more difficult to document clearly. It could look something like this:

queryRoot =
    QueryRoot.extract <|
        QueryRoot.allUsers
            [ .first => 10
            , .after |> fromVariable cursorVar
            ]
            (User.extract User.name)

...where the (=>) operator could alternatively be replaced with a named function to use with the pipe-forward operator if we want to stay away from custom operators (e.g. [ .first |> fixed 10, .after |> var cursorVar ].

Here's a basic proof of concept for implementing this kind of DSL: https://ellie-app.com/3N8yyQsKkW6a1/5

jamesmacaulay commented 6 years ago

For comparison, here’s what the same query might look like with an interface that uses record types for arguments:

queryRoot =
    QueryRoot.extract <|
        QueryRoot.allUsers
            { first = Optional 10
            , after = OptionalVar cursorVar
            , last = Absent
            , before = Absent
            }
            (User.extract User.name)

All of these args are optional, which means all of them need to be specified, and the type for the args record is then something like this (assuming variables remain non-type-safe, which wouldn’t want but is easier to show for now):

type OptionalArgValue a vars
    = Optional a
    | OptionalVar (GraphQL.Request.Builder.Variable.Variable vars)
    | Null
    | Absent

type RequiredArgValue a vars
    = Required a
    | RequiredVar (GraphQL.Request.Builder.Variable.Variable vars)

type alias AllUsersArgs vars =
    { first : OptionalArgValue Int vars
    , after : OptionalArgValue String vars
    , last : OptionalArgValue Int vars
    , before : OptionalArgValue String vars
    }

This could be modeled in other ways of course, such as using Maybes, but this way things stay flat and also somewhat self-describing.

Vynlar commented 6 years ago

@jamesmacaulay I don't have experience using graphql in production -- what is a reasonable upper bound on the amount of arguments that a given field has? Because if it's common to have more than say... 5(?) nullable arguments, I would say your first solution would be the best choice; however, if it's just a couple arguments, then having to include them every time it not a huge deal.

I value the developers experience quite highly and having to have a bunch of foo = Absent's everywhere sounds pretty bad.

sporto commented 6 years ago

We use GraphQL here, in our case we rarely have optional arguments, maybe one or two. The most we have here is relay connections arguments (first, after, etc)

Vynlar commented 6 years ago

@sporto That's good to hear. I've been working on trying to come up with examples for both of these options that @jamesmacaulay has presented here. I realize the second one is in fact much simpler, so if it's practical, that seems like the right choice.

@jamesmacaulay One issue I'm having is in trying to convert one of these argument type aliases into the older arg format. Here's an example:

Convert this:

type alias AllUsersArgs vars =
    { count : RequiredArgValue Int vars
    , ...
    }

To this:

[ ( "count", Arg.variable (Var.required "count" .count Var.int) )
, ...
]

Is it not possible to do this conversion? Should I approach this in a different way?

jamesmacaulay commented 6 years ago

I think the most annoying situations are going to be mutations where you’re passing in the attributes of some big possibly-nested entity through arguments. In those cases it really all depends on the application to say how many fields are involved and how many of them are optional.

Even so, I’m still finding myself leaning heavily toward some kind of record-based option. One other thing to consider is what the application code would look like to take your app’s internal representation of some complex entity and convert it to this weird data structure that’s specific to this library.

@Vynlar there’s no way to do the conversion you’re talking about generically, i.e. without knowing about the structure of the record. Luckily the generated code can have that knowledge, and there are lots of ways it might be implemented. One way might involve making a list of tuples of field name strings and associated functions that extract a Maybe (Arg.Value vars) from the supplied arg-record:

allUsersArgExtractors : List (String, AllUsersArgsRecord vars -> Maybe (Arg.Value vars))
allUsersArgExtractors =
    [ ( “count”, .count >> requiredToMaybeArgValue Arg.int )
    , ( “order”, .order >> optionalToMaybeArgValue Arg.string )
    ]

requiredToMaybeArgValue : (a -> Arg.Value vars) -> RequiredArgValue a vars -> Maybe (Arg.Value vars)
requiredToMaybeArgValue convert required =
    case required of
        Required x ->
            Just (convert x)
        RequiredVar v ->
            Just v

optionalToMaybeArgValue : (a -> Arg.Value vars) -> OptionalArgValue a vars -> Maybe (Arg.Value vars)
optionalToMaybeArgValue convert optional =
    case optional of
        Optional x ->
            Just (convert x)
        OptionalVar v ->
            Just v
        Null ->
            Just Arg.null
        Absent ->
            Nothing

Then in the generated implementation for the field helper, it would List.filterMap over allUsersArgExtractors with a suitable function to build up the final List (String, Arg.Value vars).

(And again, all of the variable stuff should really be a bit different because we need to make variables type-safe.)

Vynlar commented 6 years ago

@jamesmacaulay Thanks for the clarification and examples! Additionally, for my (and possibly other's) benefit, would you explain what makes the current variables not type-safe? It appears that they each define a type in the form of a Var.int or Var.id, but I imagine there's something more subtle I'm missing.

jamesmacaulay commented 6 years ago

@Vynlar the type of Var.int is VariableSpec NonNull Int, which is type-safe – we know that it can only get its value from an Elm Int. However, you must eventually convert yourVariableSpecs into Variables with Var.required or Var.optional. These functions return Variable vars values, whose type parameter (vars) is the type that all variables used in the query must get their values from.

Let’s say you end up using two variables in your query, one defined with Var.required “foo” .foo Var.int and the other with Var.optional “bar” .bar Var.string. If you just let Elm’s type inference do the work, then thevars type parameter will be unified into { a | foo : Int, bar : Maybe String}. This unification happens because at some point the library collects all those variables into a List (Variable vars), which means that the vars must all agree with one another.

This is all great and provided the mechanism for a decent DSL in 1.x, but it involves discarding the type information from the original VariableSpecs that the Variables are based on. This is because putting all the Variables into a single List requires that all their corresponding type parameters must be unifyable, and you can’t unify Int with String (or whatever).

In the case of generating these modules, I’m not even sure if the original type parameter of the appropriate VariableSpec is always the type parameter that we want to hold on to – I need to think about that some more and stare at some code for longer :)

Either way, I think it should be possible somehow to give Variable an extra type parameter and then discard it “later on” than we currently do...or something. I’m starting to not be very articulate, but hopefully all this at least makes things a little more clear.

jamesmacaulay commented 6 years ago

Back on the topic of argument records vs argument lists: this comment over in #24 reminded me that graph.cool, which has grown into a very popular GraphQL-backend-as-a-service, generates absolutely monstrous numbers of optional argument fields for the purpose of filtering result sets in just about any way you can imagine:

https://www.graph.cool/docs/reference/simple-api/filtering-by-field-xookaexai0/#explore-available-filter-criteria

I feel like this is such a stunning counterexample to our previous discussion that it is a dealbreaker for the straight records-based approach – there’s just no way I would want to force anyone to write out that many Absent fields!

The only way I can think of that a records-based approach could work in the face of that would be if we leveraged a defaults-record as well, and either provided it for the user to use when they see fit, or ask for a function instead that transforms the defaults:

QueryRoot.allUsers
    (\args -> { args | first = Optional 10, after = OptionalVar cursorVar })
    (extract User.id)

This is also pretty flawed though, because while the defaults for all optional args would be Absent, there’s no way to come up with default values for required args. This means they would have to be supplied separately in their own record:

QueryRoot.findUserById
    { id = Required “user123” }
    (\args -> { args | foo = Optional True })
    (extract User.name)

I think this confusing and would get much worse with nested objects.

And unfortunately the list-of-args approach ([ .first => 10, .after |> fromVar cursorVar ]) actually has basically the same problem: required args need to somehow be separated out and supplied independently of the list if we want type safety, because there is no guarantee that required args would actually appear in the supplied list.

I feel like every option is bad in its own terrible way 😕

Vynlar commented 6 years ago

@jamesmacaulay

Yes this makes more sense. It is type-safe up until you turn the VariableSpec into a Variable. I can figure out the details of how it all works if needed.
Both options clearly have their downsides. I've been trying to think of ways to get around this. Here's an option: We could abandon the "middle" API where you can write queries in elm but still generate code from the schema. That way, we could do whatever we need to do get the types to work well using generated schema api and generated queries. This would eliminate the need to have a clean API for writing queries if we deem it too difficult. It sucks -- sure, but it may be the more practical solution unless you come up with a way that is clean enough.

jamesmacaulay commented 6 years ago

@Vynlar I think you're right about ditching this. Most people just want to write the queries in GraphQL anyway. The schema-generated middle-ground API is a "nice idea" but it appears too problematic to make worth it.

The only thing is: even with query code gen, if someone hands us a GraphQL query that declares a variable that is a complex input object like the graph.cool filters I linked to, we are back in the same position of wanting a nice type-safe way of asking the user for it in the generated query code. At least there it's a smaller problem because the complexity is only there when the user chooses to expose whole complex objects as variables for their queries.

Vynlar commented 6 years ago

@jamesmacaulay To get around that issue, someone could write a query like this:

// Schema
type User {
  id: String,
  password: String
  name: String,
  email: String,
  age: Int,
}

// Client side query
query getUser ($userName: String!, $password: String!) {
  user(user: {name: $userName, password: $password}) { // Note the mapping between primitive variables and the object argument
    name
  }
}

This way of writing a query would avoid even exposing the optional arguments to Elm at all, right?

Of course you could shoot yourself in the foot and define the query like this:

query getUser ($user: User!) {
  ...
}

We could issue warnings if there are lots of optional variables, and also clearly document the better approach in the docs.

jamesmacaulay commented 6 years ago

Yes, having options like that (at least most of the time) definitely makes the situation better.

jamesmacaulay commented 6 years ago

Maybe this idea will be revived at a later time, but for now I will close this issue. Meanwhile I've created a new issue for discussion of query code generation.

dillonkearns commented 6 years ago

Hello @jamesmacaulay, I started working on something like this a couple of weeks ago. I just stumbled on this issue and it looks like we actually reached a lot of the same conclusions. The main place we seem to diverge is that I think it would be really awesome to have a tool that does codegen like this! I've got an MVP here that handles the vast majority of cases. I would love feedback if anyone here has any impressions. Here's an ellie example using some graphqelm generated code: https://rebrand.ly/graphqelm.

I built this package because I wanted to have something that

1) Gives you type-safe GraphQL queries (if it compiles, it's valid according to the schema), 2) Creates decoders for you in a seamless and failsafe way, and 3) Eliminates GraphQL features in favor of Elm language constructs where possible for a simpler UX (for example, GraphQL variables & fragments should just be Elm functions, constants, lets).

I wanted to get the same feeling of if it compiles it works that I get from Elm, style-elements, remote-data, etc. I think our design goals are a little different, so I think they can coexist pretty well. I hope you feel the same way.

Cheers,

Dillon

jamesmacaulay commented 6 years ago

@dillonkearns cool!

jamesmacaulay / elm-graphql

Generate Elm modules from a schema #17