gqlshield - Githubissues

winterale commented 5 years ago

Do you have a brief write-up or documentation for your gqlshield component? I'd like to understand what your design goals were and what philosophy you took in your implementation. I saw your brief description at https://github.com/graph-gophers/graphql-go/issues/324#issuecomment-498401432 but was hoping you could provide a bit more context. I'm looking at adding it as a middleware to my gql server.

romshark commented 5 years ago

The api/gqlshield package can actually be used as a stand-alone library: https://play.golang.org/p/yf0Zs8AJk_2

I haven't had the time to write a complete documentation for it yet but its interface is pretty straightforward:

// GraphQLShield represents a GraphQL shield instance
type GraphQLShield interface {
    // WhitelistQueries adds the given queries to the whitelist
    // returning an error if one of the queries doesn't meet the requirements.
    WhitelistQueries(newEntry ...Entry) ([]Query, error)

    // RemoveQuery removes a query from the whitelist and returns true
    // if any query was removed as well as the actual removed query.
    RemoveQuery(query Query) error

    // Check returns an error if the given query isn't allowed for the given
    // client role to be executed or if the provided arguments are unacceptable.
    //
    // WARNING: query will be mutated during normalization! Manually copy the
    // query byte-slice if you don't want your inputs to be mutated.
    Check(
        clientRole int,
        query []byte,
        arguments map[string]*string,
    ) ([]byte, error)

    // ListQueries returns all whitelisted queries.
    ListQueries() (map[string]Query, error)
}

The interface implementation is thread-safe, you can use it from multiple different goroutines concurrently.

The persistency manager interface defines how the shield is persisting/loading its whitelist. You can implement it however you want and pass it to the config. Alternatively just use the default NewPepersistencyManagerFileJSON implementation.

It's pretty fast (just under a microsecond for average sized queries as far as I can remember) because it performs a "visual" analysis over the query instead of properly parsing it. It does so in two steps:

query preparation (replace all spaces, tabs and line breaks with a single space in a single scan)
radix-tree lookup to determine whether or not the query is whitelisted.

Because it's based on just a text-scan it's very sensitive to query formatting meaning that this:

query(k: "v") {}

...and this...

query (
  k: "v"
) {}

...are considered different. It doesn't tolerate missing spaces yet so you'll have to define it as:

query ( k: "v" ) {}

I'm not sure whether I'll continue working on it because I got tired of GraphQL in general and started working on an alternative for Go and JavaScript this week which will have smart whitelisting built-in so I'll rather invest time there.

winterale commented 5 years ago

Thanks for the reply. I'll keep watch on your gapi to see how it turns out.

romshark commented 5 years ago

Another problem of this particular approach to whitelisting is the lack of flexibility. A whitelist should protect the API from overly complex queries, but what if the whitelist looks like this:

query($uid: ID!) {
  user(id: $uid) {
    id
    firstName
    lastName
    birthDate
  }
}

Yet we shoot this query at it:

query($uid: ID!) {
  user(id: $uid) {
    birthDate
  }
}

This query will be rejected even though it's actually less complex than what the whitelist allows.

winterale commented 5 years ago

Why not just take a complexity threshold approach like gqlgen does? One of the main issues with whitelists in general, whether it is security or performance, is that you have to provide a static list and any changes in code require changes in the list. I like that the approach gqlgen takes allows for a custom complexity to be set at the field level so that the complexity can be calculated in a more dynamic manner per query. That way the example you gave above would still work. You mentioned you don't care for this cost analysis approach and im curious as to your reasons. There are always trade offs, but it seems to me the flexibility of the approach outweighs any of the whitelist approaches.

romshark commented 5 years ago

@winterale query cost analysis is really cool but it's somewhat more complex. You can, of course, just assign a static cost to each and every field and then fine-tune it afterward but it's not as easy as simply having a list of queries you know work fine. It's complexity vs convenience in case of cost analysis vs query whitelisting.

romshark / dgraph_graphql_go

gqlshield #34