vapor / fluent-kit

Swift ORM (queries, models, and relations) for NoSQL and SQL databases
MIT License
217 stars 116 forks source link

Aliases in select queries can cause fatal errors #582

Open rausnitz opened 1 year ago

rausnitz commented 1 year ago

Describe the bug

Using aliases in select queries will cause a fatal error if the following two things happen:

Postgres has a character limit of 63 for column names.

FluentKit always uses aliases for column names in select queries: https://github.com/vapor/fluent-kit/blob/ccea9820fe31076f994f7c1c1d584009cad6bdb2/Sources/FluentSQL/SQLQueryConverter.swift#L56

It combines the Postgres schema name, table name, and column name to form the alias: https://github.com/vapor/fluent-kit/blob/ccea9820fe31076f994f7c1c1d584009cad6bdb2/Sources/FluentSQL/SQLQueryConverter.swift#L208

Postgres will return data to the client without erroring, but it will truncate the alias name, so FluentKit will treat the data like it's missing.

If the column is an @OptionalField, the value will be nil even if there was data in the database.

If the column is a @Field, the value will be nil, and referencing the property in Swift will cause a fatal error.

To Reproduce

I created this Model:

class DemoUser: Model {
    @ID(custom: "id")
    var id: Int?

    @Field(key: "sixty_character_column_1234567890_1234567890_1234567890_1234")
    var sixtyCharacterColumn: String

    static let schema: String = "users"

    required init() {}

    init(sixtyCharacterColumn: String) {
        self.sixtyCharacterColumn = sixtyCharacterColumn
    }
}

(It's an extreme example, but we did encounter this fatal error in a real backend app, due to a column with a long name in a table with a long name.)

Then I added a query to the configure(_:) function in a Vapor app:

let _ = try DemoUser.query(on: app.db).first().wait()!.sixtyCharacterColumn

Here's what I see in the logs:

[ INFO ] SELECT "users"."id" AS "users_id", "users"."sixty_character_column_1234567890_1234567890_1234567890_1234" AS "users_sixty_character_column_1234567890_1234567890_1234567890_1234" FROM "users" LIMIT 1 [] [database-id: psql]
FluentKit/Field.swift:23: Fatal error: Cannot access field before it is initialized or fetched: sixty_character_column_1234567890_1234567890_1234567890_1234
2023-08-22 11:56:29.992518-0400 Run[:] FluentKit/Field.swift:23: Fatal error: Cannot access field before it is initialized or fetched: sixty_character_column_1234567890_1234567890_1234567890_1234

The data returned by Postgres truncates the alias to its first 63 characters, users_sixty_character_column_1234567890_1234567890_1234567890_1.

Expected behavior

I think names that are allowed in Postgres should be usable in FluentKit.

Maybe it could automatically skip aliasing when the alias exceeds 63 characters. (Though, this would be problematic because other databases probably don't use this same limit, and even Postgres databases can be configured with different limits.)

Maybe there could be a property like static let shouldAliasColumns: Bool on Model that lets you control whether its columns are aliased.

Environment

This is not environment-dependent.

gwynne commented 1 year ago

Fluent performs column aliasing unconditionally as a defense against name collisions in joins - ironically, I was just telling @0xTim the other day about how to get around exactly this issue; in his case, the problem was the combination of long table names and long column names, and using ModelAlias to shorten the table name was sufficient to avoid the issue. I think you're right that Fluent needs to provide some control over this behavior; at the very least users ought to be able to specify column aliasing as a complement to the ability ModelAlias provides for tables (although admittedly the problem ModelAlias was originally intended to solve has to do with name collisions when joining a table to itself 😅).

I'm not sure of a solution that would solve this "automatically" without requiring support from individual drivers at the very least - this is how we apply SHA-1 hashing for constraint names in MySQL. I hesitate to apply a similar solution elsewhere, however; the hashing just ends up yielding confusing names and requiring even uglier workarounds in the database schema API.

rausnitz commented 1 year ago

It looks like ModelAlias may only work in the context of a join.

final class AliasExample: ModelAlias {
    static let name = "alias_examples"
    let model = DemoUser()
}

The code compiles with that class, but not when I add:

_ = try await AliasExample.query(on: app.db).all()

Build errors:

error: extraneous argument label 'on:' in call
    _ = try await AliasExample.query(on: app.db).all()
                                    ^~~~~

error: instance member 'query' cannot be used on type 'AliasExample'; did you mean to use a value of this type instead?
    _ = try await AliasExample.query(on: app.db).all()
                  ^~~~~~~~~~~~
error: dynamic key path member lookup cannot refer to static method 'query(on:)'
    _ = try await AliasExample.query(on: app.db).all()
rausnitz commented 1 year ago

Another thing to consider is adding helpful details to this message:

fatalError("Cannot access field before it is initialized or fetched: \(self.key)")

One idea would be to compute the alias string length and add something like this to the fatal error message: "Check if your database's name length limit is smaller than n characters, which is the number of characters used by this field's alias."

gwynne commented 1 year ago

@rausnitz Ooof, you're right, ModelAlias can't be used that way yet. I'll add some APIs for it as soon as I get a chance 😰. My apologies; I haven't actually looked at the darn thing since I overhauled it some time ago (and tbh I don't think I've ever actually used it myself; never had a need in a case where I wasn't already working at the SQLKit layer anyway).

In the meantime, what you can do in such a case is drop down to SQLKit - for the equivalent of the simple .all() query, it would be:

// With aliasing:
let demoUsers = try await (app.db as! any SQLDatabase).select()
    .columns(SQLColumn(SQLLiteral.all, table: SQLIdentifier("alias_examples")))
    .from(SQLAlias(SQLQualifiedTable(DemoUser.schema, space: DemoUser.space), as: SQLIdentifier("alias_examples"))
    .all(decoding: DemoUser.self)

// Without aliasing:
let demoUsers = try await (app.db as! any SQLDatabase).select()
    .columns(SQLLiteral.all)
    .from(SQLQualifiedTable(DemoUser.schema, space: DemoUser.space))
    .all(decoding: DemoUser.self)

(Of course, in SQLKit you don't need the aliasing for a query like this, and the .all(decoding:) method will correctly map the plain column names onto Fluent models.)

gwynne commented 1 year ago

Another thing to consider is adding helpful details to this message: ...

The problem is that that message can come up for several other issues that have nothing to do with aliasing. It's extremely unfortunate that Tanner built this to fatalError() for these cases, but we didn't have effectful accessors (e.g. accessors that can throw) in the language at the time and changing it to be throwing now would horribly break every single Fluent user's code 😞.

FWIW, hitting a length limit on non-key identifiers as you have is not a particularly common problem in our experience - it's only come up two or three times ever, and we don't have a good solution. It's one of the many, many, many, many, many, and did I mention MANY!!! 😅 design flaws in Fluent that we plan to resolve with Fluent 5.

rausnitz commented 1 year ago

@rausnitz Ooof, you're right, ModelAlias can't be used that way yet. I'll add some APIs for it as soon as I get a chance 😰.

If you know what you have in mind, I'd be willing to work on the implementation.

gwynne commented 1 year ago

@rausnitz There are two possible approaches I was thinking of, which aren't mutually exclusive, although one is certainly easier than the other:

  1. The easy one - modify QueryBuilder.run() to recognize when self.models.count == 1 && self.models[0] is Model.Type (e.g. there are no joins and the model to return is the same one being queried, overwhelmingly the most common case for Fluent), and in that case, modify its invocation of addFields() to generate model.keys.map { .custom(.sql(SQLIdentifier($0))) } (instead of the usual .extendedPath()), and also avoid invoking the .qualifiedSchema() modifier on output. These changes will trick the SQLQueryConverter and the database output logic into using unqualified column names.

  2. The harder one - Define a type something like QueryablePropertyAlias - probably an evolution of the existing AliasedField helper - which does for items conforming to QueryableProperty what ModelAlias does for Schemas . The main problem with this is that ModelAlias works by exploiting the preexisting static alias property of Schema; there is no similar trick to exploit for individual fields. Instead, it would most probably have to work by providing a whole new set of .field() overloads on QueryBuilder which accept only the new alias type and generate fields of the form .custom(.sql(SQLAlias(SQLColumn(SQLIdentifier(property.path[0]), table: SQLQualifiedTable(SQLIdentifier(Property.Model.schemaOrAlias), space: SQLIdentifier(Property.Model.spaceIfNotAliased))), as: SQLIdentifier(self.aliasedName)))) 😰. This would of course be a rotten hack, misusing some of Fluent's existing pieces and definitely not supporting MongoDB at all, but it's all I can think of that wouldn't break public API per se.

gwynne commented 2 weeks ago

At this point, I have to consider fixing this out of scope for Fluent 4. I'll leave this open as a reminder for Fluent 5 to provide a clean solution via driver support and intelligent aliasing.