add prefix and namespacing when adding remote schema

tirumaraiselvan commented 3 years ago

When adding a data source (v1.4+), the user should be able to give a prefix which is added to every type and root fields. This will allow adding sources which have same table names (i.e. have conflicting names), etc. The user should also be able to give a namespace so that the graph of the data source appears under that namespace field in root types.

The prefix/namespace ability should be available for remote schema (graphql data source) as well.

Type Prefixing

The types that need prefixing are:

OBJECT
INTERFACE
UNION
ENUM
INPUT_OBJECT

Important points to note:

Scalar types should be skipped from prefixing as same named scalars are "mergeable". UPDATE: See https://github.com/hasura/graphql-engine/issues/5863#issuecomment-700742259
Enum values should be skipped from prefixing.

Root fields i.e. fields under root types: query_root, mutation_root and subscription_root (or as defined by SCHEMA type) should also get the prefix. But non-root fields should not get any prefix.

Field Namespacing

When adding a source a namespace field can be given. If a namespace is given, then the fields of the source are added under that namespace respectively i.e. an object field with name <namespace> is added inside query_root, mutation_root and subscription_root and all the fields of the data source are added inside this object field.

For e.g., if the namespace is userservice:

query {
  userservice {
    getUser
}

mutation {
  userservice {
    deleteUser
}

Roadmap

Set of milestones for delivery.

POC

Current work in progress demonstrates the following:

Optional prefix can be passed in on API calls to add a remote schema
All types from the remote schema are prefixed except for __* and query_root, mutation_root, subscription_root.

The current branch (jberryman/5863-prep-refactoring-2) (as of 23/3/2021) is out of date with main and is being updated.

MVP

Additional features and changes:

Default, per-source, and per-type naming configuration supported
Tests for new functionality

A new configuration format as described by Vamshi to be consumed by APIs and stored in metadata:

customization:
# if root_fields_namespace is absent, the fields
# are merged into the query root directly
root_fields_namespace: "something"
type_names:
prefix: some_prefix
suffix: some_suffix
# mapping takes precedence over prefix and suffix
mapping:
  old_name: new_name
field_names:
old_type_name:
  prefix: some_prefix
  suffix: some_suffix
  # mapping takes precedence over prefix and suffix
  mapping:
    old_name: new_name

V1

tirumaraiselvan commented 3 years ago

@0x777 suggested that we prefix custom scalars also for remote schemas but for auto-generated custom scalar types we can ignore the prefix. Note that default scalars (String, Int, etc) should not to be prefixed.

abooij commented 3 years ago

Regarding the schema customization roadmap, here are some non-exhaustive notes from the discussion today between @0x777, @tirumaraiselvan, @codingkarthik, and myself.

We are thinking of introducing a configuration language that allows much more general renaming of GraphQL schema fields. In particular, such a language should subsume all current name configuration flags: prefixes, custom root field names, custom column names, any potential camel-casing flags, any potential identifiers defined for a table, etc.

The existing configuration flags can then be re-interpreted as automatically generating code in that configuration language. For instance, a prefix may be specified as something like:

fieldName = "my_prefix" + tableName

As a more complicated example, we can allow a user to set a root field name as the camel-cased variant of the table name, prefixed by the database name, as something like:

fieldName = databaseName + "_" + camelCase(tableName)

We can then either choose to expose this configuration language directly, or simply use it as an intermediate representation, so that the responsibility of "field naming" is shifted out of the Haskell backend, which currently has to consider all these various configuration options.

In fact, going forward, we should continue to consider if new naming configurations we introduce can be interpreted in such a hypothetical language.

In parallel to implementing "simple" naming flags such as the one in this issue, we can start thinking about how we can implement a more generic configuration language for the field names. One possibility is to depend on the dhall language, which we have already explored for the purpose of remote schema permissions. At this stage we do not think there should be a single configuration language for both names and permissions, as this might be confusing from a user point of view: so the "field naming" configuration should be completely separate from any permissions configuration.

jberryman commented 3 years ago

It would help me if we define the problem we're trying to solve here clearly.

As I understand it this is about dealing with name clashes that arise in the resulting schema from combining heterogeneous data, e.g. if we have a table named Authors in both database A and B we'd generate two different graphql types with the same name.

So a simple configuration that generates types with a particular string prefix is a "poor man's namespacing" that can help avoid this clash (or create a new clash, depending on what prefix the user chooses).

Root fields should also get the prefix

Anytime we're generating a new top-level field (or non-top level, e.g. for remote joins) the user is going to be defining the field name, right? I'm not sure why this is here.

The user should also be able to give a namespace so that the graph of the data source appears under that namespace field in root types.

This seems like a separate feature, and not one that's required to solve any name-collision problems. Maybe it can be broken out and discussed implemented separately?

It also seems very ad hoc... as a user I would wonder how this is different from a remote join?

If we actually want to eventually define a DSL for schema-reshaping and -stitching as Auke suggests then I think we should punt on this (unless I'm not inferring the problem that's being solved here).

tirumaraiselvan commented 3 years ago

Anytime we're generating a new top-level field (or non-top level, e.g. for remote joins) the user is going to be defining the field name, right? I'm not sure why this is here.

By default, the names are also auto-generated for tables right? Hence, even if types are different for two top-level fields which would auto-generate the same field name, they can't be added together. I am guessing even the source can't be added in such a scenario (although we can modify the names as per our wish once the source is added).

This seems like a separate feature, and not one that's required to solve any name-collision problems. Maybe it can be broken out and discussed implemented separately?

Sure we can discuss this separately. It was just a convenience feature when adding multiple sources, namespacing them would make the schema more tractable. Although, thinking a bit more if something is namespaced and prefixed, then erstwhile root fields (aka top-level fields) should not get the prefix which kinda makes sense because those fields are no longer root fields (thought to mention this because it is little subtle). We can cover this in a different issue if the implementation seems completely non-overlapping.

jberryman commented 3 years ago

Ah, I mis-remembered how add_remote_schema worked. I was thinking it added a single new top-level field with the specified name, rather than unpacking all the fields.

I had a thought that we should use add_remote_relationship to handle the use-case of creating a namespace (like a "join" at the top-level query root), but I guess that won't cut it, since we also want remote fields _not_to be added to the query root.

jberryman commented 3 years ago

I might design this so that we store a name-mapping function (rather than just the re-written name itself) so that we can support, say, switching between prefixing or camel-case, etc. in a forwards compatible way (see https://hasurahq.slack.com/archives/CKFUG6RCH/p1606341714282700 )

jberryman commented 3 years ago

Should the remote relationships API refer to the rewritten names or the names as they actually exist in the remote schema?

jberryman commented 3 years ago

@tirumaraiselvan

Enum values should be skipped from prefixing.

Is it worth checking that the graphql spec doesn't disallow enums to share the same value? It would be surprising if it did, maybe. But on the other hand most programming languages work that way

jberryman commented 3 years ago

The prefixing feature can almost be implemented as just some processing on the introspection result, but unfortunately:

Types appear in both fragments and variable definitions, so we have to parse and rewrite types in the remote query
The __typename meta-field means we must parse and rewrite types in the remote response

For introspection queries specifically we'll need to rewrite the results for __schema {..}, and rewrite both result and the name input argument to __type(){..}

Per the above we'll skip the scalars that are defined in the spec.

jberryman commented 3 years ago

but for auto-generated custom scalar types we can ignore the prefix.

@tirumaraiselvan I'm confused about what this refers to

tirumaraiselvan commented 3 years ago

Should the remote relationships API refer to the rewritten names or the names as they actually exist in the remote schema?

My mental model is that once a remote schema is added with a prefix, then essentially it is a different remote schema. Every interface around this remote schema should be one with the prefix. So I would say for remote relationships also, it uses the rewritten names. Let me know if this is unsound or problematic.

Is it worth checking that the graphql spec doesn't disallow enums to share the same value? It would be surprising if it did, maybe. But on the other hand most programming languages work that way

I can't find anything explicit in the docs which disallows enum values to be shared across enum types. I checked apollo framework which allows such a definition as well. Does that mean we should prefix enum values for correct semantics? Not sure, but my initial thought was that what we are really doing is avoiding type collisions and hence enum value prefixing didn't seem necessary.

but for auto-generated custom scalar types we can ignore the prefix.

Basically hasura generates custom scalars like uuid, smallint, bigint from a database source. If postgres1 and postgres2 have prefixes, then we don't need to add a prefix to these types as we know they are semantically the same. This is perhaps not relevant for remote schema prefixing but it would come into effect when prefixing database sources.

jberryman commented 3 years ago

Types with the same names and definitions get merged (I think I verified this? TODO); does the user want us to continue merging identical type definitions, or prefix all types (effectively giving up on any potential merging).

The former seems better, although this should probably be configurable I guess. That might require some significant reworking...

jberryman commented 3 years ago

Maybe for this first version we can offer an API for type re-mapping that supports:

an arbitrary map of oldTypeName -> newTypeName
an optional prefix (these will be applied to any types not matching the above)

That would require very minimal changes to what I have already and supports the use-case where there's just one or two problematic types, as well as merging of identical types

tirumaraiselvan commented 3 years ago

@jberryman Does this mean we are converging with the RFC proposed here: https://github.com/hasura/graphql-engine/pull/4821/files

jberryman commented 3 years ago

Another random thought: I'm not sure what currently happens when the remote query root is named something other than Query, but I wonder if we should consider that in the design of the namespacing feature.

One possibility: if the remote query root name matches our own query root name, we union the top-level fields, otherwise we make the remote a subfield. This is also subject to field name remapping (above), so the user could specify mapping Query to RemoteQuery and we'd add RemoteQuery as a new subfield of our Query root field

jberryman commented 3 years ago

@tirumaraiselvan @codingkarthik I've pushed the work so far to jberryman/5863-prep-refactoring-2

The top commit is really only half-done. Let me know if you want to pair sometime to go over it

jberryman commented 3 years ago

Tricky bits, to test:

queries where renamed types appear in fragments
...and in variable definitions
queries containing __typename (NOTE: to do this correctly we probably need to analyze the query and then only tweak the remote response where appropriate; we can't just look for "__typename": "old_name" in the response, since __typename may be aliased. annoying!)
both result and the name input argument to __type(){..}
__schema introspection needs to be tested, but should already work

samuelcastro commented 3 years ago

@tirumaraiselvan Do you have an idea when this will be implemented?

robcurtis commented 3 years ago

@tirumaraiselvan @jberryman Any idea on when this will be available? This is a blocker for me. I'm trying to add a remote schema (GraphCMS) to Hasura. But there is a collision on the user table between GraphCMS and my DB as the table exists in both places so it fails outright.

I don't see any workarounds, other than adding a proxy between GraphCMS and Hasura to modify the GraphCMS remote schema table name which seems a bit excessive (& also I don't know how to do this).

If this feature is coming soon, I will just wait.

But if not, I guess I'll go the proxy route & would appreciate any repos / documentation on how to easily do this. Any other recommendations? Could I just split my Apollo client in the interim to query both?

samuelcastro commented 3 years ago

I'm facing the exactly issue @robcurtis (GraphCMS user table conflicting with a different remote schema).

Please @tirumaraiselvan, we'd love a feedback here.

dmoverton commented 3 years ago

PR for remote schema customization (https://github.com/hasura/graphql-engine-mono/pull/974) has been merged. There is a new issue for type name customization for database sources: https://github.com/hasura/graphql-engine/issues/6974.

samuelcastro commented 3 years ago

@dmoverton @tirumaraiselvan will this be merged into Hasura Cloud?

pescoboza commented 3 years ago

Is there anything about this on the docs? I also need to namespace my remote schemas. :P

tirumaraiselvan commented 3 years ago

This issue has been limited to remote schema customization. For database source customization, let's use this: https://github.com/hasura/graphql-engine/issues/6974

dicolasi commented 2 years ago

if this has been merged, how can I actually use it? I am trying to add a remote schema and I got this:

Adding remote schema failed
Found conflicting definitions for "PropertyType". The definition at "query_root.searchProperties.featured.property.property_type" differs from the the definition at "query_root.transactions.result.items.propertyType".

morandalex commented 2 years ago

It seems that this thread is solved with the PR. But Idid not understand if can I add a prefix on Hasura Cloud!!

jamesknelson commented 2 years ago

I've been trying to figure this out too. It looks like it was added in this PR:

https://github.com/hasura/graphql-engine/commit/4a69fdeb01cc14e5bd70e1ec8d2f9c0b5cdaf9e6

And then reverted in this PR:

https://github.com/hasura/graphql-engine/commit/96104ec1a806bce49904d75d99b987db17b4f670

So I'm guess this could be re-opened?

jamesknelson commented 2 years ago

Upon further investigation, it seems the PR is back as of 2.1.0-beta.1.

To make use of it via the UI, you'll currently need to:

First rename any conflicting root fields on your hasura-managed tables
Add the remote schema
Edit the remote, adding a prefix
Then rename your conflicting root fields back to the original.

Alternatively, you can add it via configuration in remote_schemas.yaml: E.g.

- name: WordPress
  definition:
    url: http://localhost/wp/index.php?graphql
    timeout_seconds: 60
    customization:
      root_fields_namespace: wp
  comment: ""

clausMeko commented 2 years ago

I was running 2.2.0 where it did not work I can confirm it is active now on 2.4.0. As I was struggling with types here the mapping setting of remote_schemas.yaml:

- name: render_service
  definition:
    url_from_env: HASURA_GRAPHQL_RENDER_SERVICE_ENDPOINT
    timeout_seconds: 3
    customization:
      type_names:
        prefix: render
        mapping: {}
      root_fields_namespace: render
    headers:
    - name: authorization
      value_from_env: RENDER_SERVICE_BASIC_AUTH
  comment: ""

samuelcastro commented 2 years ago

@tirumaraiselvan Could you add more info here? Was this deployed? What version and is it available on hasura cloud?

Thanks!

tirumaraiselvan commented 2 years ago

@samuelcastro The console work while adding the remote schema is still pending. You can use the metadata API as described here for now: https://hasura.io/docs/latest/graphql/core/api-reference/metadata-api/remote-schemas/#metadata-add-remote-schema

You can use the console to add prefixes, suffixes, namespaces to an already existing remote schema (in Modify tab) like this:

hasura / graphql-engine