Automatic conversion from Int to ID is problematic

Cito commented 6 years ago

Graphene-SQLAlchemy automatically converts columns of type SmallInteger or Integer to ID! fields if they are primary keys, but does not convert such columns to ID fields if they are foreign keys.

Take for example this schema:

class Department(Base):
    __tablename__ = 'department'
    id = Column(Integer, primary_key=True)
    name = Column(String)

class User(Base):
    __tablename__ = 'users'
    id = Column(Integer, primary_key=True)
    name = Column(String)
    department_id = Column(Integer, ForeignKey('department.id'))
    department = relationship(Department)

class DepartmentType(SQLAlchemyObjectType):
    class Meta:
        model = Department

class UserType(SQLAlchemyObjectType):
    class Meta:
        model = User

class Query(ObjectType):

    departments = List(DepartmentType)
    users = List(UserType)

    def resolve_departments(self, info):
        return DepartmentType.get_query(info)

    def resolve_users(self, info):
        return UserType.get_query(info)

You can run the following query:

query {
  users {
    id
    name
    departmentId
    department {
      id
    }
  }
}

As a result, you get something like:

{
  "data": {
    "users": [
      {
        "id": "1",
        "firstName": "Fred",
        "departmentId": 1,
        "department": {
          "id": "1"
        }
      },
      {
        "id": "2",
        "firstName": "Barnie",
        "departmentId": 2,
        "department": {
          "id": "2"
        }
      }
    ]
  }
}

As you see, department.id is a string (because IDs are returned as strings), while departmentId is a number. This turned out to be a huge problem and source of error in practice. Working with this inconsistent, fault-prone interface has bitten me many times. When storing ids in objects on the frontend, or using ids as filters, I never know whether I should use numbers or strings. Currently I have conversions from number to string and vice versa everywhere in my frontend code, and if I don't do it correctly, things stop working in hard to debug ways because you often don't recognize such type mismatches. On the server side, do I take ids used as filter parameters as IDs or Ints? If I do the former, I must then convert them to integer when using them as filter arguments for SQLAlchemy. So, really, this is no fun to work with and doesn't work in practice, because you always have this mental burden of thinking about whether your ids should be represented as strings or numbers and whether you need to convert them when passing them around.

I suggest the conversions should be consistent. Either convert all keys, including foreign keys, to IDs, or do not make a special case conversion for primary keys. Actually I'd prefer the latter, since then I never need to think about the type and since storing numbers on the frontend uses less memory.

Now of course I know that there is the relay specification which assumes there is an id field with a type of ID. So when using the relay interface, things are different. In this case, I suggest converting to IDs everywhere (including foreign keys) - but here we need conversion of the values to global ids anyway, they are not just the row ids converted to strings.

Cito commented 6 years ago

Note that PostgraphQL (Graphile) doesn't convert numeric primary keys to IDs either. Relay ids are provided as nodeId. Alternatively, PostgraphQL can be configured to provide relay ids as id and row ids as rowID.

XiaoMouR commented 6 years ago

@Cito I agree with you.Also I think it's necessary to keep the consistence between model and the data responded to client.

alexisrolland commented 6 years ago

Bump, this would be usefull!

ddrouin commented 6 years ago

Agree - please leave primary key datatypes alone. ID! makes sense for the relay interface but if not using that then unconditionally coercing primary key attributes to string is not beneficial.

Trying out just always returning an Int instead of an ID in the function below for my use case and so far it seems fine.

@convert_sqlalchemy_type.register(types.SmallInteger)
@convert_sqlalchemy_type.register(types.Integer)
def convert_column_to_int_or_id(type, column, registry=None):
    if column.primary_key:
        return ID(description=get_column_doc(column),
                  required=not (is_column_nullable(column)))
    else:
        return Int(description=get_column_doc(column),
                   required=not (is_column_nullable(column)))

paunovic commented 6 years ago

@ddrouin how did you manage to revert this behavior? When I put this override in my code, it doesn't work - IDs are still returned as a base64 encoded string:

@convert_sqlalchemy_type.register(Integer)
@convert_sqlalchemy_type.register(BigInteger)
def convert_column_to_int_or_id(type, column, registry=None):
    return BigInt(description=get_column_doc(column), 
                  required=not (is_column_nullable(column)))

alexxxnf commented 5 years ago

Any progress on this one?

I use relay and I wonder if there is any workaround to have foreign keys converted automatically.

jnak commented 5 years ago

Hi everyone,

I'm interesting in pushing this over the finish line. We all agree here that the conversion should be consistent. The only thing left to decide is wether we want to convert all keys to IDs or completely opt out of this behavior.

I'm personally in favor of converting all keys to IDs because it is more correct from a type perspective. The 'ID' type means that clients are not supposed to operate on in any way. For example, the generated flow or typescript types should rightfully prevent users to concatenate two IDs, even though they are stored as strings in the DB.

Additionally, I don't think that we should not use IDs because they potentially save some memory on the frontend. This is a micro-optimization that I doubt will have any meaningful impact unless you fetch many tens of thousands of GraphQL objects. If you do happen to be in situation like this, you will probably run into bigger performance bottlenecks than converting strings to ids.

Regarding the nodeId of graphile, I'm open to the idea but we should definitely make that configurable since that it does not follow the Relay spec.

@Nabellaleen Any thoughts on that? @Cito Do you guys feel strongly the other way?

Cheers, J

lopes-gustavo commented 4 years ago

Any updates on this issue?

taffit commented 4 years ago

I'm sharing the opinion of @Cito and @ddrouin: The columns from the database should be left untouched. I named my primary keys id in the database and was completely irritated, that all of a sudden they are some encoded strings. Also, when I get back an entity with id='aas23af32asd', and I want to look that one up in the database, how can I get the real, original id of the entry?
I find the approach cited by @Cito much more appealing: If you want/need to introduce another property/field of type ID!, do it with your own name, without overwriting existing columns.

Just my two cents. As a beginner with graphene, I was lost initially.

chrisberks commented 4 years ago

My understanding of this issue is that it's not what type foreign keys should be represented as but whether they should be added to the schema at all—in my opinion, they should not be a part of the schema.

A node is an entity with static properties. A User has a name and a department but it does not have a departmentId. By adding a departmentId field to a User then using it to construct queries on the frontend, we are not Thinking in Graphs.

The foreign key field is just how we enforce the connection between a User and a Department in a relational database. In Graph Theory, both the user and department are Vertices (Nodes), the relation between them is an Edge. We implement this edge using a foreign key but it shouldn't be part of a Node.

There's a great article that talks about this in more detail: Schema Design != …

If you need to access the id of a User's Department you do this:

query {
  users {
    department {
      id
    }
  }
}

You can see an example of this in GitHub's GraphQL API. On the backend, it's a MySQL database where a repository surely has a foreign key user_id but it is not exposed in the GraphQL schema. To get the User.id of a Repository owner, you do:

query {
  viewer {
    login
    id
    repositories(first: 10) {
      edges {
        node {
          id
          # There's no `user_id` field here.
          owner {
            id # You access the `user.id` of the `owner` here.
          }
        }
      }
    }
  }
}

By converting foreign keys to ID or even adding them to the schema at all, are we encouraging sub-optimal GraphQL schema design?

JBrVJxsc commented 3 years ago

I used this way to make sure the Integer primary key keeps as Integer:

@convert_sqlalchemy_type.register(SmallInteger)
@convert_sqlalchemy_type.register(Integer)
def convert_column_to_int_or_id(type, column, registry=None):
    return Int

conao3 commented 1 year ago

FYI: With the recent code base change, I fix @JBrVJxsc's code a bit.

from graphene_sqlalchemy.utils import column_type_eq

@convert_sqlalchemy_type.register(column_type_eq(SmallInteger))
@convert_sqlalchemy_type.register(column_type_eq(Integer))
def convert_column_to_int_or_id(type, column, registry=None):
    return Int

erikwrede commented 1 year ago

https://github.com/graphql-python/graphene-sqlalchemy/pull/379 extends the column to ID column behavior from Integer to all types that are either a primary or a foreign key. Additionally, a global variable is introduced to fully disable the automatic ID conversion. This does not affect Relay-Based Models, you will need to edit the default global ID type there (possible in graphene 3.2+)

anh193 commented 8 months ago

FYI: With the recent code base change, I fix @JBrVJxsc's code a bit.
from graphene_sqlalchemy.utils import column_type_eq

@convert_sqlalchemy_type.register(column_type_eq(SmallInteger))
@convert_sqlalchemy_type.register(column_type_eq(Integer))
def convert_column_to_int_or_id(type, column, registry=None):
    return Int
Where do you put this function? I put it where the Schema is declared but it doesn't get called..

graphql-python / graphene-sqlalchemy

Automatic conversion from Int to ID is problematic #102