Query Interface: Join/Create

groue / GRDB.swift

A toolkit for SQLite databases, with a focus on application development

MIT License

6.69k stars 686 forks source link

Query Interface: Join/Create #73

Closed cfilipov closed 8 years ago

cfilipov commented 8 years ago

Are there any plans for supporting SQL join in the query interface? Along the same lines.. what about creating tables?

Edit: Just noticed the TODO.md mentions joins and a few other nice things. But not table creation.

groue commented 8 years ago

Hello @cfilipov

I don't plan to create tables via Swift, for I don't see much value there, and the amount of work necessary to support SQLite table creation features is better put somewhere else. SQL remains the best tool for the task, IMHO.

On the other side, I'm actually working on joins, but could not find a satisfactory API yet. Unlike table creation, helping generating joins can indeed be a boon, and consuming joined rows needs support as well. A natural extension to joins is the support for flat relations such as "has one" and "belongs to" for records. Well, it will ship when it's ready :-)

Quick preview:

for person in Person.include(birthCountry).fetch(db) {
    person.name
    person.birthCountry.name
}

cfilipov commented 8 years ago

I think there is definitely value in being able to express table creation in Swift for the same reason there is value in having the query builder interface in the first place. If you're using the query interface then you have a bunch of SQLColumn instances already defined. By creating tables with SQL strings you are increasing the chance for a typo in the column name. If you can create tables using SQL then you can define the column name once in a swift variable and re-use that in table creation, queries, row mapping, persistentDictionary etc...

cfilipov commented 8 years ago

I don't quite understand your example. What are the semantics of include()? I'm assuming it's doing a natural join and the birthCountry param is a table name? Why not just wrap the full SQL join syntax into Swift and let the current adapter and variance features take care of things (it's more manual but straight forward)?

Side note: I wonder if a SQL DSL should/could be done as separate effort. The query interface seems fairly integrated into the GRDB types but would it be useful to just simply have a Swift DSL that can generate a SQL string and use all the existing GRDB APIs that accept strings?

groue commented 8 years ago

@cfilipov, I'll refute each one of your arguments and suggestions below. You'll get a better idea of the nature of GRDB, and you may adjust your suggestions accordingly. They'll be welcome as long as they're aligned with the goals of the library.

I think there is definitely value in being able to express table creation in Swift for the same reason there is value in having the query builder interface in the first place. If you're using the query interface then you have a bunch of SQLColumn instances already defined. By creating tables with SQL strings you are increasing the chance for a typo in the column name.

Yes. But this kind of typo is quickly noticed, and fixed. I find this argument pretty weak.

If you can create tables using SQL then you can define the column name once in a swift variable and re-use that in table creation, queries, row mapping, persistentDictionary etc...

You can do that already: just define your constants identifiers somewhere, and use them wherever they suit. No need for library support for that.

More generally, I don't want to force users to take this path. GRDB is a bottom-up library, designed for SQL first. High-level APIs build on top of it. There is surely room for improvement, but not the one you describe.

I don't quite understand your example. What are the semantics of include()? I'm assuming it's doing a natural join and the birthCountry param is a table name?

Sorry for the incomplete example :-)

// A description of the relation from persons to their birth country:
let birthCountry = ForeignRelation(
    to: "countries",
    through: ["birthCountryID": "id"], // Joined columns, from left to right
    variantName: "birthCountry")       // Optional; helps disambiguating
                                       // several relations to the same table
                                       // when appropriate.

// SELECT persons.*, birthCountry.*
// FROM persons
// LEFT JOIN countries birthCountry
//        ON birthCountry.id = persons.birthCountryID
Person.include(birthCountry).fetch(db)

Why not just wrap the full SQL join syntax into Swift and let the current adapter and variance features take care of things

Because wrapping the full SQL join syntax into Swift would require the user to define aliases for conflicting columns and tables. It is a chore, and I'd rather ship an ORM-inspired API for joins. Moreover, constructing the joined request allows GRDB to let row adapters grant access to the joined columns in the fetched rows:

let request = Person.include(birthCountry)
for row in Row.fetch(db, request) {
    row.value(named: "id")                                // person id
    row.variant(named: "birthCountry").value(named: "id") // country id
}

// Hence
for person in request.fetch(db) {
    person.name
    person.birthCountry.name
}

That is the general direction I'm working on these days.

(it's more manual but straight forward)?

Yep, but the straight forward way in GRDB is SQL, and not a poor quality DSL that's nice on the paper but difficult to remember and use as soon as you have a non-trivial use case (take that, table creation DSL).

Side note: I wonder if a SQL DSL should/could be done as separate effort. The query interface seems fairly integrated into the GRDB types

I see your point. The separation of the query interface from the core is a background process in my mind.

The fetching features of the query interface are already hidden behind the FetchRequest protocol. Another protocol (_SQLExpressible) bubbles up, which allows expression such as nameColumn == "Arthur". More auditing may reveal other coupling issues, I admit.

but would it be useful to just simply have a Swift DSL that can generate a SQL string and use all the existing GRDB APIs that accept strings?

Now that is wrong. Simply generating SQL is not enough (see use of row adapters above). The priority of GRDB is shipping a pragmatic API - there are other more conceptual libraries out there.

cfilipov commented 8 years ago

I think there is definitely value in being able to express table creation in Swift for the same reason there is value in having the query builder interface in the first place. If you're using the query interface then you have a bunch of SQLColumn instances already defined. By creating tables with SQL strings you are increasing the chance for a typo in the column name.

Yes. But this kind of typo is quickly noticed, and fixed. I find this argument pretty weak.

It will be noticed at run time instead of failing at compile time. Compile-time checks are a core strength one enjoys when using Swift. Whenever possible I (and many others) prefer catching an error at compile time vs runtime. And in this case it is possible, so I argue it should be done.

I also disagree that this typo is "quickly" noticed. This is only noticed quickly for tables created at app launch, which covers all of your example code, but it doesn't get noticed when a table is created later in the runtime of the app or conditionally.

For example: let's say you have several migrations defined: one of the table creation strings from a past migration gets accidentally edited to mis-spell a column (could have happened by editing the wrong string without noticing, or a merge, or refactor, search/replace, etc...). Since the migration has already happened you won't hit this code until you decide to test your app from a fresh state where it will run though all the migrations. Compare this to immediately noticing because of a build failure.

If you can create tables using SQL then you can define the column name once in a swift variable and re-use that in table creation, queries, row mapping, persistentDictionary etc...

You can do that already: just define your constants identifiers somewhere, and use them wherever they suit. No need for library support for that.

Indeed, that's what I'm doing now:

extension Entry.Table {
    static func create(db: Database) throws {
        try db.execute(
            "CREATE TABLE \(Entry.Table.name) (" +
                "\(Entry.Column.id.name) integer PRIMARY KEY AUTOINCREMENT NOT NULL," +
                "\(Entry.Column.workoutID.name) integer" +
                "\(Entry.Column.exerciseID.name) integer NOT NULL" +
        ");")
    }
}

But this is pretty tedious, and also, not as type safe. I can use any string for a column, including a string that does not represent a column, whereas a type-safe solution would force one to use a SQLColumn and fail at compile time. The string may also include characters that break the statement like commas.

By the way, did you notice that I forgot comma in my example above (near "workoutID")? This is an easy mistake to make, and easy to miss and it won't be noticed until runtime, but if I had a Swift DSL to build this table creation statement then this typo wouldn't be possible.

More generally, I don't want to force users to take this path. GRDB is a bottom-up library, designed for SQL first. High-level APIs build on top of it. There is surely room for improvement, but not the one you describe.

I definitely don't suggest forcing users into this path. I would expect the SQL string interface to still exist in its current form, in the same way that you are free to choose between the query interface and the SQL string interface today for queries. My suggestion is to provide a type-safe alternative to table creation, but the string APIs would still be the default.

I understand your position that there are more important things you would rather focus on, but I strongly disagree that type-safe table creation is not valuable in general (given enough time to tackle it). Type-safe compile time checks are better than stringly-typed API, always.

For my project I have been using SQLite.swift up until now. I really liked how my entire schema, all queries and procedures, and even migrations were done in Swift. When I made breaking schema changes, the project failed to compile, when it compiled it usually meant that things will work.

That said, SQLite.swift has been kind of a pain. This is subjective: but I find its API to be obtuse. I find GRDB's fetch(), fetchOne(), fetchAll() to be much clearer and elegant to pluck() and other choices made in SQLite.swift. Same goes for getting values, very consistent and clear API decisions on your part. Those are just some examples, I have been constantly delighted by GRDB's design.

I also found myself writing certain protocols for SQLite.swift to reduce the amount of boilerplate. I soon discovered this project and realized I had basically implemented the equivalent of Persistable, TableMapping, RowConvertible and DatabaseValueConvertible on top of SQLite.swift!

Migrating my code to GRDB resulted in less boilerplate, less code overall, and better design and organization. I'm much happier with using GRDB so far, but I really miss having a more complete SQL builder interface. I understand one of your philosophies with GRDB was to provide a Swift library for those who prefer to craft SQL and want its power with the convenience of a higher level library. And I feel that you have achieved that. But I don't think that your philosophy is in conflict with also providing the SQL DSL on top of that like you have in other areas already.

groue commented 8 years ago

Indeed, that's what I'm doing now:
extension Entry.Table {
    static func create(db: Database) throws { ... }
}
[...]

I understand your position that there are more important things you would rather focus on, but I strongly disagree that type-safe table creation is not valuable in general (given enough time to tackle it). Type-safe compile time checks are better than stringly-typed API, always.

Let me elaborate on two problems that are latent in the way you look at the topic. The first problem is an application maintenance problem. The second problem lies in the weak definition of "type safety". We'll see that if there is room for improvement in order to lower the dependency on "string programming", I don't think it is the one you champion, and today I really think that GRDB is in better shape when it does not address the problem at all, as it is today, than it would be if it would address it in a bad way.

The maintenance problem

Your Entry.Table.create() method is a practice I avoid. I used to do that, of course, because who doesn't want to embed in a class the definition of the database table it is responsible for? Isn't it the regular exercise of sane design principles such as encapsulation, separation of concerns, single responsability?

The problem is: putting the table creation statements in classes doesn't fit well with database schema changes and migrations. The create method lies: it only creates the table of version 1 of the database schema. As soon as you have to migrate the database, create turns into a fraud.

How do you cope with migrations when you apply this pattern?

You need to add other static methods with funny names such as updateForColumnBlah or migrateToVersion2 (you may have more inspiration).
The encapsulation falls apart because Entry.Table is unable to encapsulate the fact that migrateToVersion2 has to be run before migrateToVersion3, and that OtherType.Table.migrateToVersion3 has to be run before its own migrateToVersion3.

The points above show that the Entry.Table class is not the correct encapsulation unit for the entries database table.

Now the fate of the Entry.Column.id, workoutID, etc. columns is not better. What do you do with obsolete columns and columns that change type? As soon as you use them in migrations code, they can not be removed or changed: the migrations that use them have to stay forever (if not forever, as long as you support previous versions of your application). This means that the compiler won't help that much.

As a conclusion: should GRDB push for Swift columns and table name constants, it would maybe help, but not as well as you try to put it. One of the core design decisions of GRDB is that the library avoids frictions at all costs whenever the application code base evolves, increases its complexity, enters the non-trivial zone. I think that the delight you describe (thanks :-)) when you use GRDB is a direct consequence of that: picking the best GRDB API for your use case and changing your mind later is usually painless. So if I admit that the problem you describe exists, the solution you propose is too rigid. And I don't have a better one. I'd thus rather put some burden on application code, even if an unfortunate consequence is that applications take debatable design decisions. I feel sorry, but that's how experience grows.

You may have guessed that I personnally feel very happy with string programming: once a migration has been written and checked, I never think about it again, and I quite like that.

Type safety

Quite a subject. So hyped, so badly understood. SQLite.swift is the champion of type safety, isn't it? Well, I think that the way SQLite.swift tackles the topic is broken beyond any hope. I have to go know, but I'll come back with a extended rationale :-)

groue commented 8 years ago

BTW - I'm not trying to close the discussion: I just want to say that the subject is more delicate that it looks at first sight. GRDB is flexible, maybe too flexible, but it allows applications to use it in the way they want. I'm myself balanced between the desire to avoid boilerplate and runtime checks, and the ability to go as raw as I need whenever relevant. So far, the ability to go raw prevails, that's true, because I feel like all other opinionated high-level APIs I've seen fall in three categories: toys, locks, and bombs. Toys are limited, locks reveal unsuitable long after you've invested too much time with them to leave, and bombs punish you hard when you misuse them. There are many toys out there, SQLite.swift is both a lock and a bomb, Core Data is a bomb. There's a reason GRDB exists.

cfilipov commented 8 years ago

Regarding your comment on table creation. I agree, that approach would be limiting for the reasons you mentioned. My code snippets purpose was to illustrate the pitfalls of building SQL CREATE statements in Swift vs using a DSL. Admittedly the code was distracting, but since we're on that topic I might as well explain the full story behind that snippet if you don't mind a tangent from the topic.

This is a strategy I employed in my app which currently uses SQLite.swift, but I was able to use the same design with GRDB as well (but haven't pushed the code yet). I'm not suggesting people should do this, by the way. This is a personal project where I have the luxury of experimenting with weird ideas I otherwise couldn't. But I hope you will agree this weird idea seems to work pretty well. At the very least it might be on the right track towards a better idea.

Entry, Table and Column in my previous comment are not actually classes. These are just Swift enums that serve as a namespace. This is a neat trick in Swift, to use enums to group things together because an enum without any cases cannot be instantiated. In fact, you are doing something very similar in your Record playground with the struct Col, which you could change to enum Col to ensure Col can't be instantiated.

So using this trick I can now group my Swift SQL schema into enums for each version of the Schema (each Schema enum corresponds to a single migration and contains everything you need to migrate and query).

enum Schema20160524095754146: Schema { 
    private typealias Prev = Schema20160410215418161
    static var version: Int64 = 20160524095754146

    static func migrate(db: Connection) throws {
        try Prev.Exercise.Table.drop(db)
        try Entry.Table.create(db)
        try Exercise.Table.create(db)
        try Workout.Table.create(db)
    }
}

Each Schema defines a static method used for migrating it from the previous schema. The previous schema, also a namespace/enum is referred by a Prev typealias. Each Schema namespace contains more nested namespaces representing the tables and columns for the model objects (which are never referenced here directly). Here I define some tables for my new schema:

extension Schema20160524095754146 {
    typealias Workout = Prev.Workout // This table has not changed, "import" it into this Schema
    enum Entry {
        enum Table {
            static let name = "entry"
        }
        enum Column {
            static let id = SQLColumn("entry_id")
            static let workoutID = SQLColumn("workout_id")
            static let exerciseID = SQLColumn("exercise_id")
            static let exerciseName = SQLColumn("exercise_name")
        }
    }
    /* ... more table definitions ...*/
}

Somewhere, globally in the app, I define the current schema:

typealias CurrentSchema = Schema20160524095754146

The actual Entry struct looks like this:

struct Entry {
    var id: Int64?
    var workoutID: Int64?
    var exerciseID: Int64?
    var exerciseName: String
}

Then, I associate the model with its schema:

extension Entry {
    typealias Schema = CurrentSchema.Entry
}

Now, here's how I conform to the GRDB protocols:

extension Entry: RowConvertible {
    init(_ row: Row) {
        id = row.value(named: Schema.Column.id.name)
        workoutID = row.value(named: Schema.Column.workoutID.name)
        exerciseID = row.value(named: Schema.Column.exerciseID.name)
        exerciseName = row.value(named: Schema.Column.exerciseName.name)
    }
}

extension Entry: TableMapping {
    static func databaseTableName() -> String {
        return Schema.Table.name
    }
}

And some usage in queries:

dbq.inDatabase { db in
    Workout
        .order(Workout.Schema.Column.startTime.desc)
        .filter(10)
        .fetch(db)
        .forEach { print($0.exerciseName) }
}

Note that if I don't care about mapping rows to a struct, I can just use the CurrentSchema directly.

Now let's say I change the schema. I create a new enum with new table definitions, and typealias the parts of the previous schema that have not changed. All the table creation and migration is handled in the self-contained schema. I then change the global alias CurrentSchema to point to the new schema. Once I do that, If there are breaking changes, like if I removed the exerciseName column, then I will have compile errors which direct me right at the places where my app depend on the incompatible schema.

In summary: I have all my SQL tables and columns namespaced into enums for each migration. And the models just typealias to their corresponding parts of the current schema, which is itself also a typealias to the latest schema. This ensures I have the Swift representation of the schema for each migration, so migrations can be done in pure swift (when using SQLite.swift). It's not perfect, but this approach has been very useful for me, it's worked well and has caught many issues at compile time that otherwise would have only been detected at runtime (if I'm lucky). It's won't catch everything at compile time, but it catches some of the easiest mistakes one can make and works extremely well so far.

groue commented 8 years ago

Wow, @cfilipov, that's quite an involved setup :-)

Listen: I still have to come up with something with joins. Today I'm still wondering how to deal with automatic table aliasing, precisely how to let the user express WHERE and ON conditions on aliased tables without referring to them via strings - that's tricky, and I'm not even sure I'll eventually succeed properly. I hope this process will help me clarifying my ideas on the string vs. compiler constants.

I could not manage to write about type safety as I promised - to be short:

Fighting the weak type system of SQLite is wrong.
Using the Swift compiler to notify of errors is nice on the paper, but painful in everyday life, especially when the compiler error happens inside a nested expression. Such compiler errors are often indecipherable. If you don't know that, that means that you have not tried to push Swift to its limits yet. Those bad compiler errors restrict any library that heavily relies on them to experienced programmers only (and even experienced programmers will often feel puzzled). The runtime errors of GRDB are much more efficient at explaining what's wrong, so that the developper can fix the problem, and move on.

Remember: GRBD is a tool, not a proof-of-concept. This is paramount.

cfilipov commented 8 years ago

I think your first point about the mismatch of trying to tack on a static type system to SQLite's weak type system is quite good and I understand that. As far as the second point, yes the compile errors can get very hard to understand. It seems you prefer clear runtime errors to unclear compile time errors. I disagree with this but respect your decision to drive your project in the direction you feel is best. Given all this, why even both to implement a Swift query building interface in the first place?

Anyway, I'll stop bugging you about this topic, I think we beat it to death already. And as you said, there are higher priorities, like getting ready for Swift 3, you're on that, right? ;).

groue commented 8 years ago

why even both to implement a Swift query building interface in the first place?

Because some people fell uncomfortable with SQL, that's it!

getting ready for Swift 3,

You have seen the Swift3 branch already, didn't you?

cfilipov commented 8 years ago

getting ready for Swift 3,

You have seen the Swift3 branch already, didn't you?

I saw that branch. I was just playfully reminding you that's a more important priority so you should stop paying attention my silly issues like this and get back to finishing that Swift 3 branch. One of the reasons I picked up this library in the first place was because I'm not seeing any indication from SQLite.swift that he's going to migrate any time soon.

Thanks for the great work and taking the time to debate the SQL builder interface, looking forward to the future iterations.

I still hope you change your mind in the future, but for now I think it's time to close this issue.

groue commented 8 years ago

FYI @cfilipov, the Swift3 branch is already in good shape, and follows swift evolution pretty closely.

It should even be delicate migrating your GRDB code from Swift2 to Swift3, since a lot of APIs have changed to better reflect the new API guidelines.

If you have a project that uses GRDB and that you consider migrating to Swift3, I advise you to migrate now, since each incremental update to the Swift language will be much easier after the initial migration. And you'll be ready when Swift3 is officially shipped.

zmeyc commented 8 years ago

@group @cfilipov

The create method lies: it only creates the table of version 1 of the database schema. As soon as you have to migrate the database, create turns into a fraud.

Another option is always creating the latest version of table in create() method. Update logic can look like this: if database is not created yet, .create() all of the tables and set this version as latest, otherwise apply migrations. Is this more error-prone?

cfilipov commented 8 years ago

Sorry to resurrect this thread but I'd like ask: If I were to submit a pull request that introduces a query builder interface for table/view creation and joins would you consider accepting it?

groue commented 8 years ago

@cfilipov Welcome back :-) Let's avoid spending time for nothing: before submitting a PR, don't forget to make a general proposal with a few code snippets that illustrate the usage of the DSL you envision.

Suggested use cases (1 is unavoidable to my eyes, 2 is nice to have and should have room):

-- (1) regular table creation (2) temporary tables
-- Out of scope: tables without rowid, schema names, create if exists,
-- virtual tables, create table as select...
CREATE TABLE foos (
  -- (1) rowID primary key
  id INTEGER PRIMARY KEY,

  -- (1) The four main SQLite affinities, (2) numeric affinity
  text TEXT,
  int INTEGER,
  data BLOB,
  double DOUBLE,
  numeric NUMERIC,

  -- (1) NOT NULL, (1) DEFAULT, (2) COLLATE
  foo TEXT NOT NULL DEFAULT 'foo' COLLATE nocase

  -- (1) references with full cascade/restrict/etc actions
  barID INTEGER NOT NULL REFERENCES bars(id) ON DELETE CASCADE ON UPDATE CASCADE

  -- (2) multi-column references
  x INTEGER,
  y INTEGER,
  FOREIGN KEY(x, y) REFERENCES bars(a, b)

  -- (1) multi-column primary key
  PRIMARY KEY (a, b)
)

-- (1) indexes, eventually unique, (2) eventually covering
CREATE INDEX fooOnBaz foo(baz)

-- (1) ALTER TABLE ... ADD COLUMN
ALTER TABLE foos ADD COLUMN bar INTEGER

About the type of columns: I already see that the numeric affinity has no natural peer in the landscape of Swift types. But I don't know quite well what we lose if we don't provide any way to create a column with numeric affinity. People will want to store "dates", not "strings", and "bools", not "integers", and Swift types are a natural fit. I leave this to you ;-)

groue commented 8 years ago

@cfilipov What I care about here is that falling back to SQL should not be shocking. Let me try to explain:

As soon as you use a high-level DSL, having sometimes to go back to SQL looks odd. This oddity yields subjective feelings like:

this migration is dirty - SQL is such a despicable language.
the DSL is weak - come on, what kind of toy is that?
this migration looks smartish - what kind of dark magic requires raw SQL?

Those feelings can't be 100% avoided. And there are three victims whose subjective quality suffers: applications, developers, and GRDB itself.

So I wish the table creation DSL would be expressive, consistent, and comprehensive enough to leave SQL to the really special migrations.

groue commented 8 years ago

@cfilipov Another point that deserves a little place in our mind: https://github.com/groue/GRDB.swift#advanced-database-schema-changes

Now I guess you know why I want us to discuss before we code :-)

groue commented 8 years ago

@cfilipov You launched me, I can't stop:

low priority, but nice to have, and we must think about crossed-referencing tables
important: you'll have to choose between a procedural vs. descriptive API.

Procedural is: "do this, then do that", and supported by the current DatabaseMigrator. Descriptive is "make this happen", and is not supported today.

Descriptive is harder: if you plan to have a Table type that describes a table, with the ambition that its instances would be available outside of a migration for any purpose you envision, then I don't see how we can avoid generating "diff SQL", the SQL that turns the current schema of the database to the schema required by the table definition. And that's very difficult :-)

groue commented 8 years ago

@cfilipov Oh, I forgot about confict resolution clauses! Let's put them in the "nice to have". GRDB has a weakness here: conflict resolution clauses make it difficult to know whether an INSERT statement has inserted a row or not - and Persistable.didInsertWithRowID(_:forColumn:) does not address this question.

cfilipov commented 8 years ago

About the type of columns: I already see that the numeric affinity has no natural peer in the landscape of Swift types.

I've never used the NUMERIC affinity and not quite sure what the use cases are for such a column. My inclination is to not support this affinity in the DSL initially. As you will see below, it is possible to implement the most commonly used features while still being able to add more later.

Procedural is: "do this, then do that", and supported by the current DatabaseMigrator. Descriptive is "make this happen", and is not supported today.

Descriptive is harder: if you plan to have a Table type that describes a table, with the ambition that its instances would be available outside of a migration for any purpose you envision, then I don't see how we can avoid generating "diff SQL", the SQL that turns the current schema of the database to the schema required by the table definition. And that's very difficult :-)

The difficult thing about making the DSL completely declarative is that you end up too magical, too far removed from the SQL and it starts to look more like an ORM, and I believe neither one of us wants a full-blown ORM. My goal was to have a little "magic" as possible in the DSL.

Prerequisites

Before showing you examples I think it's worth taking the time to briefly share some pseudo code of how the examples can be implemented.

Builders

I envision a builder pattern approach where each builder object is responsible for building a subset of the SQL CREATE grammar. The goal is to make the DSL feel as close to SQL as possible and deviate where it makes sense (I think your query builder does a good job of this, ie. multiple filter() use is considered AND instead of having to use &&, very nice touch).

public protocol BuilderType {
    func build() throws -> String
}

public protocol CreateBuilder : BuilderType { }

Each builder implements the CreateBuilder protocol. They are chained together to form a complete statement. Each builder specifies the next part of the grammar with instance methods that return the next builder. The chaining is also type-safe so you cannot chain builders in a way that donsn't make sense.

This pattern also makes it somewhat easy to expand the DSL grammar over time to more completely support SQLite's features. For example, adding CHECK later would be relatively easy without effecting all the builders by simply defining a new builder and adding it to the object graph.

Some builders can also be re-used, like the foreign-key-clause which exists in both the column constraint and the table constrain grammar.

The following pseudo code roughly outlines what it would look like for a CREATE TABLE. Note that this is only for CREATE TABLE. I consider CREATE VIEW and CREATE VIRTUAL TABLE separate hierarchies of builders even though the syntax is somewhat similar, and it seems from SQLite's documentation that they consider them separate as well.

This code is just to help illustrate the example usage, I don't want to get too far into the details of the builder until we settle on usage (though of course they are related and we may end up needing to discuss them in more detail).

public struct CreateTableBuilder: CreateBuilder {
    var name: String
    var temp: Bool = false
    var ifNotExists: Bool = false

    public func build() throws -> String {
        var parts = ["CREATE"]
        if temp {
            parts.append("TEMP")
        }
        parts.append("TABLE")
        if ifNotExists {
            parts.append("IF NOT EXISTS")
        }
        return parts.joined(separator: " ")
    }

    public func columns(_ columns: [ColumnType]) -> SelectFromBuilder {
        return CreateTableColumnsBuilder(prefix: self, columns: columns)
    }
}

public struct CreateTableColumnsBuilder: CreateBuilder {
    let prefix: CreateTableBuilder
    var columns: [ColumnType]

    public func build() throws -> String {
        precondition(columns.count > 0)
        let columnDefs = columns.map { try $0.build() }.joined(separator: ", ")
        return try [prefix.build(), "(", columnDefs, ")"].joined(separator: " ")
    }

    public func foreignKeys(_ columns: [ColumnType]) -> CreateForeignKeyClauseBuilder {
        ...
    }
}

public struct CreateTableConstraintBuilder: CreateBuilder { 
    ...
}

Column Type

I also changed GRDB's SQLColumn into a generic Column<T> type so we can express the column type more succinctly. Column implements CreateBuilder for representing the column constraints. The example usage below can be done without this, but would then require additional notNull(...) and type(...) builder methods (or params).

Example Usage

Now let's get into the usage. I am going to show how the builder interface for tables could look, and how it is flexible enough to let the developer choose how they organize their code.

Simple Example

The following code demonstrates how you would create a table using this builder interface.

This is almost a 1-1 transcription of the actual SQL, anyone familiar with SQL should find it fairly readable.

Let's see a simple table first:

TableNamed("bar")
    .create(ifNotExists: true)
    .columns(
        Column<Int64>("id")
            .primaryKey(autoIncrement: true),
        Column<String>("text"))

Now a more complete example:

TableNamed("foo")
    .create(ifNotExists: true)
    .columns( // var args
        Column<Int64>("id")
            .primaryKey(autoIncrement: true), // autoIncrement param is optional 
        Column<Int64?>("int"), // nullable
        Column<String>("text")
            .unique(onConflict: .rollback)
            .default(0)
            .collate(.noCase),
        Column<Data>("data"),
        Column<Double>("double"),
        Column<Int64>("bar_id")
            .onNull(.ignore) // only available on Column<T> where T is non-optional
            .references(
                table: Bar, // or Bar.databaseTableName()
                onDelete: .cascade, // [optional]
                onUpdate: .cascade // [optional]
                deferrable: true, // [optional] default: false
                columns: Bar.Column.id)) // var args

Column Variables

The problem with the previous code is that we aren't able to re-use those columns in our queries. We would need to keep copying Column<String>("text") every time we need that column. The next logical progression would be to define variables for the columns so that we can re-use them. Column<T> implements StringLiteralConvertible so we can express our columns this way:

static let id: Column<Int64> = "id"
static let int: Column<Int64?> = "int"
static let text: Column<String> = "text"
static let data: Column<Data> = "data"
static let double: Column<Double> = "double"
static let bar: Column<Int64> = "bar_id"

TableNamed("foo")
    .create(ifNotExists: true)
    .columns(
        id
            .primaryKey(autoIncrement: true), 
        int,
        text
            .unique(onConflict: .rollback)
            .default(0)
            .collate(.noCase),
        data,
        double,
        bar
            .onNull(.ignore)
            .references(
                table: Bar,
                onDelete: .cascade,
                onUpdate: .cascade
                deferrable: true,default: false
                columns: Bar.Column.id))

The same for Bar:

static let id: Column<Int64> = "id"
static let text: Column<String> = "text"

TableNamed("bar")
    .create(ifNotExists: true)
    .columns(
        id.primaryKey(autoIncrement: true),
        text)

Grouping Column Variables

Those free-floating variables may conflict with each other (both tables define id and text). We could re-use them in some cases, or we could take advantage of the fact that Swift case-less enums cannot be instantiated to group similar variables together.

enum Foo {
    static let table: TableNamed = "foo"
    enum Column {
        static let id: Column<Int64> = "id"
        static let int: Column<Int64?> = "int"
        static let text: Column<String> = "text"
        static let data: Column<Data> = "data"
        static let double: Column<Double> = "double"
        static let bar: Column<Int64> = "bar_id"
    }
}

enum Bar {
    static let table: TableNamed = "bar"
    enum Column {
        static let id: Column<Int64> = "id"
        static let text: Column<String> = "text"
    }
}

This isn't part of GRDB or enforced by the builder interface. This is just one way someone might choose to organize their column definitions, and the builder interface works just fine with this approach:

Foo.table
    .create(ifNotExists: true)
    .columns(
        Foo.Column.id
            .primaryKey(autoIncrement: true), 
        Foo.Column.int, 
        Foo.Column.text
            .unique(onConflict: .rollback)
            .default(0)
            .collate(.noCase),
        Foo.Column.data,
        Foo.Column.double,
        Foo.Column.bar
            .onNull(.ignore)
            .references(
                table: Bar, 
                onDelete: .cascade,
                onUpdate: .cascade
                deferrable: true, 
                columns: Bar.Column.id))

Bar.table
    .create(ifNotExists: true)
    .columns(
        Bar.Column.id.primaryKey(autoIncrement: true),
        Bar.Column.text)

Defining Column Constraints Early

Alternatively, we can define column constraints at the time we define the columns, rather than at table creation:

enum Foo {
    static let table: TableNamed = "foo"

    enum Column {
        static let id = Column<Int64>("id")
            .primaryKey(autoIncrement: true)
        static let int = Column<Int64?>("int") 
        static let text = Column<String>("text")
            .unique(onConflict: .rollback)
            .default(0)
            .collate(.noCase)
        static let data = Column<Data>("data")
        static let double = Column<Double>("double")
        static let bar = Column<Int64>("bar_id")
            .onNull(.ignore)
            .references(
                table: Bar,
                onDelete: .cascade,
                onUpdate: .cascade
                deferrable: true,
                columns: Bar.Column.id)
    }
}

struct Bar {
    static let table: TableNamed = "bar"

    enum Column {
        static let id: Column<Int64> = "id"
        static let text: Column<String> = "text"
    }
}

Now the table creation would look like this:

Foo.table
    .create(ifNotExists: true)
    .columns(
        Foo.Column.id, 
        Foo.Column.int, 
        Foo.Column.text, 
        Foo.Column.data,
        Foo.Column.double,
        Foo.Column.bar)

Bar.table
    .create(ifNotExists: true)
    .columns(
        Bar.Column.id,
        Bar.Column.text)

This approach makes the column definitions slightly less readable, but provides more context where is more likely to be seen. When you are building a query you might jump to the definition of Foo and be reminded of some of these constraints which would otherwise be hidden in the migration code where the create actually takes place.

Table Constraints

An example using table constraints:

Baz.table
    .create(ifNotExists: true)
    .columns(Baz.Column.someID, Baz.Column.otherID, Baz.Column.foo)
    .primaryKey(Baz.Column.someID, Baz.Column.otherID)
    .foreignKeys(Baz.Column.foo)
    .references(table: Foo, columns: Baz.Column.foo)

Making Columns Available on Your Model

If you want to have all your column definitions available as statics on your actual model object TableMapping can be extended to provide a builder interface by adding create() -> CreateBuilder as an extension. Now you can define

struct Bar: TableMapping {
    var id: Int64
    var text: String

    static func databaseTableName() -> String {
        return "bar"
    }

    enum Column {
        static let id: Column<Int64> = "id"
        static let text: Column<String> = "text"
    }
}

Bar
    .create(ifNotExists: true)
    .columns(
        Bar.Column.id,
        Bar.Column.text)

let row = Bar.filter(Bar.Column.id == 42).fetchOne(db)

Migrating would involve refactoring the code to rename Bar to Bar_v1, then doing the necessary work in the migration to add a column or create a new table and drop the old one (if it's a change like renaming or dropping a column). I've been doing this myself and it works very nicely, but it's important to note that this is not something that's imposed by the builder interface, it's something that it enables, and it's up to the developer how they want to organize the code (and you have now seen several ways it could be done).

Migrations

Migrations are still handled procedurally. The SQL statements themselves are declarative, but the actions of creating a table or migrating a database are procedural.

I think this is a nice compromise between procedural and declarative. As you said, going all the way and making the table fully declarative would probably require generating SQL diff for migrations which I find too automatic and a lot of work to implement correctly. I just want a way to write SQL without gluing strings together manually, but I still want to essentially write SQL.

Final Thoughts

Implicit Tables

It might be valuable to define columns with their tables:

static let table: TableNamed = "foo"
static let id = Column<Int64>(name: "id", table: table)
static let text = Column<String>(name: "text", table: table)

This is convenient later on when you want to do something like join a table and need to qualify the column with the table name. Without this, it would be up to the developer to remember to do this when they build the query. This wouldn't be such a big deal if I could find a more elegant syntax for doing that. It would also make foreign key references when creating a table simpler. You would only have to provide a Column, and the table would be implied.

The trade off here is the extra verbosity when defining your columns, but maybe it's worth it.

Expressions and Statements

To support things like AS <select_stmt> and CHECK <expr> we could take advantage of your existing _SQLSelectable and _SQLExpressible interfaces. To support this the builder interface would need to accept a StatementArguments param:

public protocol BuilderType {
    func build(_ arguments: StatementArguments) throws -> String
}

This allows us to avoid the inout StatementArguments pattern you relied on in _SQLSelectQuery (in fact, I played around with turning _SQLSelectQuery into a builder and it was quite successful, I think this approach is a bit cleaner).

Columns as a param to Create()

The columns(...) method could have just been another param of create(...) but I think the usage looks cleaner to separate them. Generally I tried to keep each builder 1-1 with the way the SQLite documentation groups the grammar (ie. create-table-stmt, table-constraint, foreign-key-clause each map to one of the builders) except for places where I felt it made the builder code or usage cleaner without compromising much (like the columns() method). Here's what the builder interface could look like in this case:

Foo.table
    .create(ifNotExists: true, columns:
        Foo.Column.id, 
        Foo.Column.int, 
        Foo.Column.text, 
        Foo.Column.data,
        Foo.Column.double,
        Foo.Column.bar)

Params vs Methods

As discussed in the previous section, there are many other places where one could just as easily implement part of the builder as a parameter to a builder method or as a separate builder method. I'm interested in which ones make the most sense.

groue commented 8 years ago

Many thanks @cfilipov. Please let me have a detailed reading.

groue commented 8 years ago

Three points:

I envision a builder pattern approach where each builder object is responsible for building a subset of the SQL CREATE grammar.

The builder pattern has indeed many advantages:

it is easily extensible, and this helps adding features according to their priority
it is "fluent", a very trending fashion

It also disadvantages, because of the abundance of ad-hoc types it requires:

it heavily pollutes the public interface. Given the poor state of Swift documentation generators, it is a real problem. To reduce this, the query interface currently uses many underscore-prefixed semi-public types (especially for helper types that I don't want to commit to maintain, and that are unlikely to be used by the user).
the abundance of types makes it harder to follow program execution (a real problem I have in my current work on joins)
once the doors are opened, it's easy to just throw new methods and types in. Problem is that the DSL becomes too wide. Good for demos, not good for users.

I appreciate that your sample code follows closely the SQL syntax.

I also changed GRDB's SQLColumn into a generic Column type so we can express the column type more succinctly.

SQLColumn is part of the query interface, and the query interface is untyped. I see a conflict in your change and I recommend that you revise your copy.

I'm against a typed query interface. It generates bad compiler errors that are hard to understand (StackOverflow is full of questions about SQLite.swift because of that). And it is unable to talk fluently to the weak typing of SQLite.

That's why in GRDB, types are pulled by the user:

let row = Row.fetchOne(db, "SELECT '2016-06-30'")!
let string: String = row.value(atIndex: 0)  // "2016-06-30"
let date: Date = row.value(atIndex: 0)      // Date
self.foo = row.value(atIndex: 0)            // depends on the type of property

SQLColumn and the query interface extend this pattern, not the opposite.

The problem with the previous code is that we aren't able to re-use those columns in our queries [...] Migrating would involve refactoring the code to rename Bar to Bar_v1

Here I don't follow you. To make your code nice-looking, you have to build a cathedral. You have already talked about it above.

Let me be clear: this pattern is much too much demanding, is representative of you current way of addressing your needs, and can be easily criticized. It is not a reliable and recommendable ground. Thus it is not a selling point for your table creation API. Actually, it's the opposite.

Maybe another approach would be to try to write documentation, instead of trying to convince me how brilliant the DSL is. Try to see it from the point of view of a regular developer, and help him instead of telling him how he should code. If the tool is inspiring, imaginative readers will be inspired.

cfilipov commented 8 years ago

The builder pattern has indeed many advantages:

There's more advantages:

It can closely follow the structure of the grammar, making it easy to understand and maintain.
It isolates the responsibilities of parts of the grammar so each builder is small, specific and simple.

As for disadvantages:

it heavily pollutes the public interface. Given the poor state of Swift documentation generators, it is a real problem. To reduce this, the query interface currently uses many underscore-prefixed semi-public types (especially for helper types that I don't want to commit to maintain, and that are unlikely to be used by the user).

I have to agree this is a good argument against them. You could hide the builders by making them private and having a wrapper type be returned to each builder, but then chaining would not longer be type-safe and it would be easier to construct incorrect statements. This would also still leak the wrapper.

That said. I think it's OK the have this be visible to the public interface. Perhaps there is part of the SQL grammar you don't want to support right now. Someone can extend the builders to add that support themselves.

the abundance of types makes it harder to follow program execution (a real problem I have in my current work on joins)

I think the abundance of types makes things easier to follow. A well-structured graph is much more clear than trying to figure out what a long set of instructions are tying to do. Your query code is the opposite of the builder. It is a single class responsible to building the query procedurally. I believe if it were a builder it would be easier to implement and reason about joins (and I don't just believe that. I actually re-wrote your query builder as a set of separate builders and saw several advantages as I mentioned earlier. I think the trade off was worth it).

once the doors are opened, it's easy to just throw new methods and types in. Problem is that the DSL becomes too wide. Good for demos, not good for users.

I believe this to be a very weak argument. It's easy to make any code unmanageable. If it's not appropriate to add a method, don't merge it. If the DSL is becoming too wide please tell me what parts you wish to be removed.

It also disadvantages, because of the abundance of ad-hoc types it requires:

I don't understand what the problem is with more types. What practical problem do they cause? Some people are deterred by declaring too many types, and often for no reason other than "there's too many". If you can solve your problem more clearly and with better correctness by using more types then declaring more types is appropriate. If you are writing longer functions which do more and are more complex just to avoid declaring more types what are you really gaining?

Will you accept a PR with a builder? Are you prompting for more debate? Will you suggest an alternative? I'm not sure how to proceed here.

I'm against a typed query interface. It generates bad compiler errors that are hard to understand (StackOverflow is full of questions about SQLite.swift because of that). And it is unable to talk fluently to the weak typing of SQLite.

I never proposed a typed query interface. The typed column does not change how queries are done, the type is only used for building tables.

public protocol ColumnType: _SpecificSQLExpressible {
    var name: String { get }
}

public struct Column<DataType>: ColumnType {
    public typealias ColumnType = DataType
    public let name: String
    public let table: Table?
    ...
}

Throughout the query code you would use ColumnType in places where you currently have SQLColumn, they are pretty much interchangeable. I understand it might be strange to have that type information in the column, but only use it in the create statements, which is why I proposed alternatives which you did not comment on.

Let me be clear: this pattern is much too much demanding, is representative of you current way of addressing your needs, and can be easily criticized. It is not a reliable and recommendable ground. Thus it is not a selling point for your table creation API. Actually, it's the opposite.

This is the pattern you are using in your own example code. If you provide the user with the ability to represent columns in Swift as you have, they will need a way to organize that code and use it in their queries. And as you said, if you give them a way to use column variables in their table creation code they will need a way to use those variables in migrations. You cannot ignore the way the code will be used when designing the API. I have provided several examples of how one could use this API, and you seem to be getting distracted by just one.

If you do not like one approach, please advice an alternative. I'm not in love with this pattern, it's just the only one I have found that works. I would love to hear your solutions for building sql in Swift while also handling migrations in Swift.

It seems you have been distracted by certain parts of this conversation and failed to address the actual API itself. You haven't provided any feedback on the create() DSL or the thoughts on implicit tables in the column type, the table constraint API or the merits of being able to define column constraints ahead of time or at table creation time. These are areas I was really looking forward to hearing your opinions.

Let me also be clear, I am not begging you to adopt my code. I got what I need working and I offer my free time to modify it in such a way that you might accept it as a PR. While we often disagree on many things, I am still delighted by the code you have produced, so I know that if I collaborate with you it could result in much better design than if I were to keep my own private fork as I have been so far. If you don't find value in any of this I am happy to back off and leave this as-is.

cfilipov commented 8 years ago

oops. I hit "Close and comment" instead of just "Comment"

groue commented 8 years ago

It seems you have been distracted by certain parts of this conversation and failed to address the actual API itself. You haven't provided any feedback on the create() DSL or the thoughts on implicit tables in the column type, the table constraint API or the merits of being able to define column constraints ahead of time or at table creation time. These are areas I was really looking forward to hearing your opinions.

Let's address your most urgent questions :-)

Your DSL is good. More or less like the one I was expecting, with good ideas indeed.

OK, typed columns are wrong, because the query interface would look odd not using theses types (and they won't). What's the point of repeating Int64 in Int64.fetchAll(db, Foo.select(SQLColumn<Int64>("id")))? I admit that this library is not always DRY, but this is going too far. What if I want to add an Int and a Double, a perfectly valid SQLite operation? Don't you sweat already? So really typed columns are not a good idea.

OK, implicit tables in the column type are nice, but don't fit well with my current work on joins. This is information you did not have. Hint: don't put table names in columns, GRDB will take care of scoping columns.

OK, it should be possible to use a custom collation as well as the ones that ship with SQLite.

These are all little issues that can be solved.

Now the two main, fundamental, real problems remain: the DSL readability (vs SQL), and migrations (code vs actual database schema).

Yet I was actually interested by your come back and your DSL proposal, because some time had elapsed since our last conversations. Maybe you would propose something that I did not expect?

And I have read a brilliant intellectual construction. This is why I answered: "Maybe another approach would be to try to write documentation". Documentation is not a discussion with a peer, as we're having now. It is an exercice in helping at solving problems, and a good way to find the shortcomings of a design. I don't say I'm good at writing doc, I say writing doc has enhanced GRDB a lot.

Don't be too disappointed. Thank you for your free time - I give plenty myself.

groue commented 8 years ago

@cfilipov Let me show how I would look at the table creation DSL.

A classic problem with ORMs is their inability to deal well with the fact that the database schema and the code may be out of sync. When I read code like the following:

Foo.table
    .create(ifNotExists: true, columns:
        Foo.Column.id, 
        Foo.Column.int, 
        Foo.Column.text, 
        Foo.Column.data,
        Foo.Column.double,
        Foo.Column.bar)

I know that we've failed, because code like above is a non goal, not to say an anti-pattern.

A good migration is a migration that does not change.

There are some apps that last a few years. A rapid development cycle can create many migrations.

Migrations that change are very difficult to test, and migration that don't change make it easier to maintain the application over time.

Given that, an immediate conclusion is that a good migration does not share any value with queries. No shared table objects, no shared column objects. There are people who will want to share, at least, string constants. That's their problem, but the migration DSL will not foster this practice.

For examples of frameworks that apply these sane rules, look at Core Data, and Active Record (the Ruby library).

OK now let's see how code that apply those principle would look like.

First, types used by the migration DSL must be difficult to use outside of the migration DSL:

// Wrong, because this creates the desire to use `table` for other purposes:
let table = Table("foos", ...)
table.create(db)

How do we prevent this?

// Better, but too complex
schema.createTable("foos", ...)

// Better, worth investigating
schema.createTable("foos") { table in
    table...    // column
    table...    // column
    table...    // primary key, index, whatever relevant
}

Just to be sure, let's make sure we support other kinds of schema changes:

schema.addTableColumn(...)
schema.createIndex(...)

And voilà, now there's no leakage of migration objects. The detailed API, the nature of schema, its relation to the current migration API, and other bikeshedding can start.

cfilipov commented 8 years ago

A good migration is a migration that does not change.

In my example, the migration does not change. The symbol name for Foo in Swift is renamed, but the migration is unaffected. This symbol rename can be done automatically and in a type-safe way. But I get your point, you are likely against even renaming the symbol Foo even if it has no effect on the generated SQL.

I know I have mentioned this before a few times, and you seem to still be unconvinced on the value: being able to share migration variables like this allows even greater safety in the app. I really like the fact that a breaking change in the schema will result in a build failure. This is a very obvious benefit you seem to be uninterested in.

Think about an very non-trivial application that has possibly hundreds of different queries. Some of these queries will be broken by a migration, some will not. Because of the decoupling there is no way to tell other than to exercise every possible path that executes these queries. Just because other libraries haven't tried to solve this problem doesn't make it less valuable.

Moving on, I'll continue the rest of this discussion with your decoupled migration as the strategy, and leave the coupling as a parallel conversation so it does not distract from the create API (I hope to convince you of the value, or be convinced of its lack of value).

One more point about sharing columns before I move on: assume I agree that we should not share column definitions between migrations and queries in the app code. It may be useful to still share columns within a migration. You might want to query an old table you are about to drop in a migration into a new table and want to do that in Swift.

cfilipov commented 8 years ago

Might also be useful to support some amount of PRAGMA (which is pretty simple):

Schema.pragma(.foreignKeys = true)
Schema.pragma(.cacheSize = 700000)

or using associated types enums

Schema.pragma(.foreignKeys(true))
Schema.pragma(.cacheSize(700000))

cfilipov commented 8 years ago

A couple changes to the API you proposed: it seems the most idiomatic way to do this in Swift 3 is to use the first parameter label to differentiate the different create functions. So instead of createTable() and createView() you would have create(table:) and create(view:) etc...

Schema.create(table: "foo", ...)
Schema.create(index: "foo_index", ...)
Schema.create(virtualTable: "bar", ...)
Schema.create(view: "foo_view", ...)
Schema.create(trigger: "foo_on_insert", ...)

Dropping tables:

Schema.drop(table: "foo", ...)
Schema.drop(index: "foo_index", ...)
Schema.drop(virtualTable: "bar", ...)
Schema.drop(view: "foo_view", ...)
Schema.drop(trigger: "foo_on_insert", ...)

Alter:

Schema.alter(table: "foo")
    .add(Column("new_col", affinity: .integer)) // no need for `column:` label because the type can be inferred

Schema.alter(table: "foo").rename(to: "foo2")

Full create example:

Schema.create(table: "foo", ifNotExists: true, columns:
    Column("id", affinity: .integer, notNull: true) 
        .primaryKey(autoIncrement: true),
    Column("int", affinity: .integer),
    Column("text", affinity: .string)
        .unique(onConflict: .rollback)
        .defaults(0) // unfortunately, `default` is a reserved keyword.
        .collate(.noCase),
    Column("data", affinity: .data),
    Column("double", affinity: .double),
    Column("bar_id", affinity: .integer)
        .onNull(.ignore)
        .references(
            table: "bar",
            onDelete: .cascade,
            onUpdate: .cascade
            deferrable: true,
            columns: Column("id")))

The affinity of a column is an optional part of the grammar, and defaults to NUMERIC if the type is omitted in the create statement. So it is possible to define a column as simply Column("name").

Even more descriptive, but also more verbose, using a name label for the column initializers:

Schema.alter(table: "foo")
    .add(Column(name: "new_col", affinity: .integer))

Also might make sense to name Schema as SQL and have all top level builders hang off that type:

migrator.registerMigration("v2") { db in
    db.execute(
        SQL.alter(table: "foo")
            .add(Column(name: "new_col", affinity: .integer)))
}

Alternately

migrator.registerMigration("v2") { db in
    SQL(db).alter(table: "foo")
        .add(Column(name: "new_col", affinity: .integer))
}

cfilipov commented 8 years ago

The examples bring up the question of how Column relates to SQLColumn. Having two separate column types for table creation and querying seems unnecessary and possibly confusing. But I would guess that you would be against sharing the column types between the two domains because that would allow one to define a bunch of column constrains while building a query (and that doesn't make sense).

The example you used, which is a mix of procedural and declarative by using a closure, would solve this by putting the root column builder behind a method:

SQL(db).create(table: "foo") { table in
    table.add(column: "id", affinity: .integer, notNull: true)
        .primaryKey(autoIncrement: true)
    table.add(column: "text", affinity: .string)
        .unique(onConflict: .rollback)
        .defaults(0) // unfortunately, `default` is a reserved keyword.
        .collate(.noCase)
}

This makes the builders a bit more complicated, but I haven't fully thought out how this implementation would look so maybe it's not that complicated.

It might tempting to do things like add an index to the table inside create() which I believe is wrong because that is a separate statement and should be kept separate in the DSL (less magic, closer to what the actual SQL would look like).

SQL(db).create(index: "foo_idx", on: "foo"...)...

Here we're being stringly again and might mis-type "foo". It might be useful to return a "table" instance and use that to create idexes etc...

let foo = SQL(db).create(table: "foo"...)...
foo.create(index: "foo_idx"...)...
foo.alter(table: "foo"...)...
foo.drop()

The foo variable would only be used in the migration in this case (thought wouldn't be useable across migrations, but I suppose you can offer a way to create a table by name to re-use across several statements).

let foo = SQL.table(named: "foo")

I would prefer to avoid the closure approach and just put the column definitions behind a builder method like this. Here there would be a _ColumnBuilder type but the user wouldn't be forced to declare it directly like the first example I used:

migrator.registerMigration("v2") { db in
    db.execute(SQL
        .create(table: "foo", ifNotExists: true)
            .column(name: "id", affinity: .integer, notNull: true) 
                .primaryKey(autoIncrement: true)
            .column(name: "int", affinity: .integer)
            .column(name: "text", affinity: .string)
                .unique(onConflict: .rollback)
                .defaults(0)
                .collate(.noCase)
            .column(name: "data", affinity: .data)
            .column(name: "double", affinity: .double)
            .column(name: "bar_id", affinity: .integer)
                .onNull(.ignore)
                .references(
                    table: "bar",
                    onDelete: .cascade,
                    onUpdate: .cascade
                    deferrable: true,
                    columns: "id"))
}

I think the above snippet is the cleanest and also most straight forward to implement so far.

cfilipov commented 8 years ago

My apologies for the bombardment of comments. I have been iterating on this idea a lot. I considered deleting my previous comments to make it easier to follow the discussion, but it might be helpful to see how I arrived at the current proposal so I will leave them. Feel free to skip down this this comment if it's too much to read.

I moved most of the builder methods into labeled parameters. Builder methods work better when there are mutually exclusive options in the grammar, but in most cases I was able to work around that with enums. This might be possible in all cases but so far I have not run into a case where it didn't work. Table constraints (not shown) would still be added via chained builder methods.

migrator.registerMigration("v2") { db in
    db.execute(SQL.create(table: "foo", ifNotExists: true)
        .column(
            name: "id", 
            affinity: .integer, 
            notNull: true, // default is `true`
            primaryKey: true), 
        .column(name: "int", affinity: .integer)
        .column(
            name: "text", 
            affinity: .text, 
            unique: .rollback, // can also be `true`/`false` or other enum values, `false` is default.
            default: 0, // `default` *can* be used as an argument label, but the xcode highlighter mistakenly marks it red like other reserved words
            collate: .noCase)
        .column(name: "data", affinity: .data)
        .column(name: "double", affinity: .double)
        .column(
            name: "bar_id", 
            affinity: .integer,
            onNull: .ignore,
            references: SQL.foreignKey(
                table: "bar",
                onDelete: .cascade,
                onUpdate: .cascade
                deferrable: true,
                columns: "id"))
}

I think that this is overall a nicer syntax than the previous examples up to this point, except for two potential problems:

The foreignKey reference at the end requires the use of a top level static method on SQL (or Schema, as you suggested). This could be simplified by making SQL an enum then you could simply write .foreignKey(...), but associated values in enums can't have default values which would make it required to always provide onDelete and onUpdate which someone should be allowed to leave out. This same problem occurs with the primaryKey parameter (where I experimented with using an enum).
The unique: .rollback param reads strangely because you lose some context from the builder method. Previously the builder method gave us the ability to call it with more words .unique(onConflict: .rollback). The enum cases could be renamed to .onConflictRollback, .onConflictAbort, etc... but this would then read poorly when used in the table constraints (see below).

The primaryKey: param takes an enum that is also a BooleanLiteralConvertible so this simple example works:

SQL.create(name: "foo", ifNotExists: true)
    .column(
        name: "id",
        primaryKey: true,  // non-auto-incrementing primary key
        affinity: .integer,
        notNull: true)

To define conflict resolution:

SQL.create(name: "foo", ifNotExists: true)
    .column(
        name: "id",
        primaryKey: .onConflict(.ignore), // non-auto-incrementing primary key
        affinity: .integer,
        notNull: true)

To specify auto-incrementing:

SQL.create(name: "foo", ifNotExists: true)
    .column(
        name: "id",
        primaryKey: .autoIncrement(order: .desc, onConflict: .abort),
        affinity: .integer,
        notNull: true)

Unfortunately, because I used an enum with associated values, you are forced to specify order and onConflict when using .autoIncrement because default values cannot be used. I am still trying to find a solution that reads well and is implementable in swift. There are quite a few options here.

Here's another example, this time with table constraints:

migrator.registerMigration("v2") { db in
    let fooId = SQLColumn("foo_id")
    let bazId = SQLColumn("baz_id")
    let a = SQLColumn("a")
    let b = SQLColumn("b")
    let c = SQLColumn("c")

    let createBar = SQL.create(table: "bar", ifNotExists: true)
        .column(name: fooId, affinity: .integer)
        .column(name: bazId, affinity: .integer)
        .column(name: a, affinity: .integer)
        .column(name: b, affinity: .integer)
        .column(name: c, affinity: .integer)
        .unique(columns: fooId, bazId, onConflict: .rollback) // this is legal in Swift, but would you prefer to have these parameters reversed? (I think it reads better in this order though)
        .primaryKey(columns: a, b, c, onConflict: .rollback) // runtime error if multiple `primaryKey()` used
        .foreignKey(
            columns: fooId, 
            onConflict: .rollback,
            references: SQL.foreignKey(
                table: "foo",
                columns: "id"))
        .foreignKey(
            columns: bazId, 
            onConflict: .rollback,
            references: SQL.foreignKey(
                table: "baz",
                columns: "id"))
        .check(a != b)
        .check(b != c)

    db.execute(createBar)
}

Above you see how the shorter _OnConflict enum cases like .rollback read better than .onConflictRollback.

groue commented 8 years ago

@cfilipov I think we're on the right track now :-). Don't code yet, because I want to discuss the fluent method chaining (although I bet on you this time), a few other minor points like the intermediate SQL generation step, and I'd also like to see if we have room for future remove/alter column migrations (the ones that SQLite does not support easily, and require foreign key disabling - they might (I don't know yet) have an impact on the API).

cfilipov commented 8 years ago

I'm happy to hear that our ideas are now converging @groue. Don't worry, I'm not trying to waste time writing code that won't get accepted, but I did write the bare minimum code (highly stubbed) needed to test out some of the ideas to make sure I wasn't getting away with impossible syntax.

Edit: By the way, my understanding is that you cannot drop or rename a column in SQLite. You have to create a new table with a temp name, migrate the data, then drop the old table and rename the new one.

groue commented 8 years ago

(Very long and detailed answer)

A good migration is a migration that does not change.

In my example, the migration does not change. The symbol name for Foo in Swift is renamed, but the migration is unaffected. This symbol rename can be done automatically and in a type-safe way. But I get your point, you are likely against even renaming the symbol Foo even if it has no effect on the generated SQL.

Yes. To put it in another way: it must be possible to avoid any change (up to the git diff). People can do otherwise, it won't be my problem.

I know I have mentioned this before a few times, and you seem to still be unconvinced on the value: being able to share migration variables like this allows even greater safety in the app. I really like the fact that a breaking change in the schema will result in a build failure. This is a very obvious benefit you seem to be uninterested in.

This looks like a benefit, but I stand by my position: it is an anti-pattern that threats application maintenance over time. When the code doesn't describe the database state, we have a big problem. What I call your "cathedral" of versioned enums and types is an inflated answer to a real problem. I stand that the problem should be avoided, not cured.

Think about an very non-trivial application that has possibly hundreds of different queries. Some of these queries will be broken by a migration, some will not. Because of the decoupling there is no way to tell other than to exercise every possible path that executes these queries. Just because other libraries haven't tried to solve this problem doesn't make it less valuable.

I feel sorry when I see people dive in the current trend for type safety and forget about higher level development priorities. It's a tool, not more.

Moving on, I'll continue the rest of this discussion with your decoupled migration as the strategy, and leave the coupling as a parallel conversation so it does not distract from the create API (I hope to convince you of the value, or be convinced of its lack of value).

All right.

One more point about sharing columns before I move on: assume I agree that we should not share column definitions between migrations and queries in the app code. It may be useful to still share columns within a migration. You might want to query an old table you are about to drop in a migration into a new table and want to do that in Swift.

OK we'll see that in the details of the DSL.

Might also be useful to support some amount of PRAGMA (which is pretty simple):
Schema.pragma(.foreignKeys = true)
Schema.pragma(.cacheSize = 700000)
or using associated types enums
Schema.pragma(.foreignKeys(true))
Schema.pragma(.cacheSize(700000))

GRDB uses the Configuration type for that:

var config = Configuration()
config.readonly = true
config.foreignKeysEnabled = true // The default is already true
config.trace = { print($0) }     // Prints all SQL statements

let dbQueue = try DatabaseQueue(
    path: "/path/to/database.sqlite",
    configuration: config)

A couple changes to the API you proposed: it seems the most idiomatic way to do this in Swift 3 is to use the first parameter label to differentiate the different create functions. So instead of createTable() and createView() you would have create(table:) and create(view:) etc...

True :-)

Schema.create(table: "foo", ifNotExists: true, columns:
    Column("id", affinity: .integer, notNull: true) 
The affinity of a column is an optional part of the grammar

OK let's talk about the definition of the types of columns. What are our options?

Affinity, as in your above sample code.

Plus side: it's very straight-forward, and the only information that SQLite actually cares about.

Minus side: unless you know SQLite pretty well, you don't even know what affinities are.
Strings: "INTEGER", "DECIMAL(10,5)", "VARCHAR(16)".

Plus side: total control.

Minus side: nobody cares. If you're up to provide SQL types as strings, you'll create your table with SQL anyway.
Swift types: Int.self, Bool.self, Date.self.

Plus side: the intent of usage of the column is crystal clear.

Minus side: this sends the message that the column type is locked. But people may want to read, for example, dates as strings, or Swift dates, or date components, depending on their use case. Reading ints as bools, using the fact that SQLite treats 0 as false, also comes to my mind.
An enum for general SQL types: .integer, .double, .date, .time, .datetime, .text, etc. There will be omissions (decimals, varchar(n)) because SQLite don't handle them at all.

Plus side: the intent of storage is crystal clear (when above, it was the usage intent).

Minus side: I don't know.

I'd vote for SQL types.

[...] and defaults to NUMERIC if the type is omitted in the create statement. So it is possible to define a column as simply Column("name").

Wouldn't BLOB be the best affinity for this case? Quoting SQLite doc:

A column with affinity BLOB does not prefer one storage class over another and no attempt is made to coerce data from one storage class into another.

Nevertheless, thanks for this really surprising suggestion. I foresee documentation problems, because I don't see myself starting by this case (it doesn't look very serious at first sight), and by the time users have understood typed columns, they won't be much impressed by this unexpected shortcut. What do you think?

Even more descriptive, but also more verbose, using a name label for the column initializers:
Schema.alter(table: "foo")
    .add(Column(name: "new_col", affinity: .integer))
Also might make sense to name Schema as SQL and have all top level builders hang off that type:
migrator.registerMigration("v2") { db in
    db.execute(
        SQL.alter(table: "foo")
            .add(Column(name: "new_col", affinity: .integer)))
}
Alternately
migrator.registerMigration("v2") { db in
    SQL(db).alter(table: "foo")
        .add(Column(name: "new_col", affinity: .integer))
}

OK let me try - first draft, heavily inspired by Active Record migrations and your own sample codes, with semi-realistic models:

db.schema.create(table: "persons") { t in
    t.column("name", .text, default: "Anonymous")   // defaults, defaulting?
    t.column("birthDate", .date)
    t.column("email", .text, unique: true, collate: .nocase)
}

db.schema.create(table: "countries", primaryKey: "isoCode") { t in
    t.column("isoCode", .text)
    t.column("name", .text, null: false)
}

db.schema.create(table: "citizenships", primaryKey: ["personId", "countryIsoCode"]) { t in
    t.column("personId", .integer, null: false)
        .references("persons", onDelete: .cascade, onUpdate: .cascade)
    t.column("countryIsoCode", .text, null: false)
        .references("countries", onDelete: .cascade, onUpdate: .cascade)
}

db.schema.create(table: "passports") { t in
    t.column("personId", .integer, null: false)
    t.column("countryIsoCode", .text, null: false)
    t.column("issueDate", .datetime)
    t.references("citizenships", from: ["personId", "countryIsoCode"])
}

A few notes on the attempt above:

There is not a single value type initializer in sight (it's impossible to even think about reusing any value outside of the migration - the best way to avoid frustration is not making any promise that can't stand)
Default "id INTEGER PRIMARY KEY" unless specified otherwise.
The primary key is defined next to the table name. I hope that is helps the reader.
unique and null have simple boolean versions that cover the very common use case which does not define explicit conflict clause.
Referenced columns default to the primary key of the target table, the most common use case.
The onDelete/onUpdate clauses may deserve improvements. Core Data may be an inspiration.

Table modification:

db.schema.alter(table: "persons") { t in
    t.addColumn("lastSyncDate", .datetime)
    t.removeColumn("birthDate")
    // add not null constraint
    t.changeColumn("email", .text, unique: true, null: false, collate: .nocase)
}

Removing columns and changing columns is for the future. But grouping the altering statements is important for the DSL to be able to generate the required SQL. See https://github.com/groue/GRDB.swift#advanced-database-schema-changes for an example.

A problem is that db.schema is available everywhere, when this construct has to be reserved to migrations.

Indexes:

db.schema.create(index: "personsOnEmail", on: "persons", columns: ["email"], unique: true)

I'd like to make the index name optional:

db.schema.create(indexOn: "persons", columns: ["email"], unique: true)

OK, back to you:

The examples bring up the question of how Column relates to SQLColumn. Having two separate column types for table creation and querying seems unnecessary and possibly confusing. But I would guess that you would be against sharing the column types between the two domains because that would allow one to define a bunch of column constrains while building a query (and that doesn't make sense).

Yep. That's why I have tried to get rid of any visible value type in the migration code.

The example you used, which is a mix of procedural and declarative by using a closure, would solve this by putting the root column builder behind a method:
SQL(db).create(table: "foo") { table in
    table.add(column: "id", affinity: .integer, notNull: true)
        .primaryKey(autoIncrement: true)
    table.add(column: "text", affinity: .string)
        .unique(onConflict: .rollback)
        .defaults(0) // unfortunately, `default` is a reserved keyword.
        .collate(.noCase)
}

Exactly. Our sample codes are very close here.

This makes the builders a bit more complicated, but I haven't fully thought out how this implementation would look so maybe it's not that complicated.

Let's help the API users before ourselves :-)

It might tempting to do things like add an index to the table inside create() which I believe is wrong because that is a separate statement and should be kept separate in the DSL (less magic, closer to what the actual SQL would look like).
SQL(db).create(index: "foo_idx", on: "foo"...)...

Looks sensible.

Here we're being stringly again and might mis-type "foo". It might be useful to return a "table" instance and use that to create idexes etc...
let foo = SQL(db).create(table: "foo"...)...
foo.create(index: "foo_idx"...)...
foo.alter(table: "foo"...)...
foo.drop()
The foo variable would only be used in the migration in this case (thought wouldn't be useable across migrations, but I suppose you can offer a way to create a table by name to re-use across several statements).
let foo = SQL.table(named: "foo")

I agree that the string variables can look quite ad-hoc:

let personsTable, name, birthDate, email = "persons", "name", "birthDate", "email"
db.schema.create(table: personsTable) { t in
    t.column(name, .text)
    t.column(birthDate, .date)
    t.column(email, .text, unique: true, collate: .nocase)
}
db.schema.create(indexOn: personsTable, columns: [email], unique: true)

I admit that the avoidance of value types has a cost.

migrator.registerMigration("v2") { db in
    db.execute(SQL
        .create(table: "foo", ifNotExists: true)
            .column(name: "id", affinity: .integer, notNull: true) 
                .primaryKey(autoIncrement: true)
            .column(name: "int", affinity: .integer)
            .column(name: "text", affinity: .string)
                .unique(onConflict: .rollback)
                .defaults(0)
                .collate(.noCase)
            .column(name: "data", affinity: .data)
            .column(name: "double", affinity: .double)
            .column(name: "bar_id", affinity: .integer)
                .onNull(.ignore)
                .references(
                    table: "bar",
                    onDelete: .cascade,
                    onUpdate: .cascade
                    deferrable: true,
                    columns: "id"))
}

I think the above snippet is the cleanest and also most straight forward to implement so far.

Let me try the same table:

migrator.registerMigration("v2") { db in
    db.schema.create(table: "foo", ifNotExists: true) { t in
        t.column("int", .integer)
        t.column("text", .string, default: "0", collate: .noCase)
            .unique(onConflict: .rollback)
        t.column("data", .blob)
        t.column("double", .double)
        t.column("bar_id", .integer)
            .notNull(onConflict: .ignore)
            .references("bar", onDelete: .cascade, onUpdate: .cascade, deferrable: true)
    }
}

The conflict clauses on unique and not null require a builder extension just like in your sample code.

Thoughts?

My apologies for the bombardment of comments. I have been iterating on this idea a lot. I considered deleting my previous comments to make it easier to follow the discussion, but it might be helpful to see how I arrived at the current proposal so I will leave them. Feel free to skip down this this comment if it's too much to read.

You're totally welcome.

I moved most of the builder methods into labeled parameters. Builder methods work better when there are mutually exclusive options in the grammar, but in most cases I was able to work around that with enums. This might be possible in all cases but so far I have not run into a case where it didn't work. Table constraints (not shown) would still be added via chained builder methods.

Hu hu you did just touch the limits of a pattern. I'm happy that you can feel when such problems arise.

migrator.registerMigration("v2") { db in
    db.execute(SQL.create(table: "foo", ifNotExists: true)
        .column(
            name: "id", 
            affinity: .integer, 
            notNull: true, // default is `true`
            primaryKey: true), 
        .column(name: "int", affinity: .integer)
        .column(
            name: "text", 
            affinity: .text, 
            unique: .rollback, // can also be `true`/`false` or other enum values, `false` is default.
            default: 0, // `default` *can* be used as an argument label, but the xcode highlighter mistakenly marks it red like other reserved words
            collate: .noCase)
        .column(name: "data", affinity: .data)
        .column(name: "double", affinity: .double)
        .column(
            name: "bar_id", 
            affinity: .integer,
            onNull: .ignore,
            references: SQL.foreignKey(
                table: "bar",
                onDelete: .cascade,
                onUpdate: .cascade
                deferrable: true,
                columns: "id"))
}
I think that this is overall a nicer syntax than the previous examples up to this point, except for two potential problems:

The foreignKey reference at the end requires the use of a top level static method on SQL (or Schema, as you suggested). This could be simplified by making SQL an enum then you could simply write .foreignKey(...), but associated values in enums can't have default values which would make it required to always provide onDelete and onUpdate which someone should be allowed to leave out. This same problem occurs with the primaryKey parameter (where I experimented with using an enum).

The unique: .rollback param reads strangely because you lose some context from the builder method. Previously the builder method gave us the ability to call it with more words .unique(onConflict: .rollback). The enum cases could be renamed to .onConflictRollback, .onConflictAbort, etc... but this would then read poorly when used in the table constraints (see below).

I agree it requires a litle juggling with builders and arguments until the API looks correct.

The primaryKey: param takes an enum that is also a BooleanLiteralConvertible so this simple example works:
SQL.create(name: "foo", ifNotExists: true)
    .column(
        name: "id",
        primaryKey: true,  // non-auto-incrementing primary key
        affinity: .integer,
        notNull: true)

I'm not sure my solution for primary keys is the best either (default "id INTEGER PRIMARY KEY", and listing the primary key columns in the create(table:primaryKey:ifNotExist:) method)

To define conflict resolution:

SQL.create(name: "foo", ifNotExists: true)
    .column(
        name: "id",
        primaryKey: .onConflict(.ignore), // non-auto-incrementing primary key
        affinity: .integer,
        notNull: true)

To specify auto-incrementing:

SQL.create(name: "foo", ifNotExists: true)
    .column(
        name: "id",
        primaryKey: .autoIncrement(order: .desc, onConflict: .abort),
        affinity: .integer,
        notNull: true)

It's been a while I wanted to tell you that the "auto-incrementing" word should not be used in our API for "INTEGER PRIMARY KEY" columns. SQLite does not recommend the usage of truly auto-incremented columns for performance reasons. Instead, "INTEGER PRIMARY KEY" columns guarantee automatic uniqueness (the most common goal), but do not guarantee incrementation (ids of deleted rows can be reused).

Still you remind me I did not handle conflicts and ordering of primary keys in my sample code. Duly recorded.

Here's another example, this time with table constraints:

I'll answer later on table constraints.

Conclusion of this message: I have tried to show an alternative. I did not cover as many cases as you, so I'm not sure yet it will stand. Thanks a lot for your involvement here, it is greatly appreciated.

groue commented 8 years ago

Side note - a glimpse on future joins. Joins belong to the query realm so they won't be used when creating table references. Yet it would be better if they looked somewhat similar.

Let's jump right into a non-trivial join problem, and take a person which has two parents and a birth country:

CREATE TABLE persons (
  id INTEGER PRIMARY KEY,
  name TEXT,
  fatherId INTEGER REFERENCES persons(id),
  motherId INTEGER REFERENCES persons(id)
  birthCountryId INTEGER REFERENCES countries(id),
)

Joins in the query interface aim at solving the following use cases:

consuming joined tables (give me all persons along with their parents and birth country)
filtering on joined tables (give me all persons whose mother is named "Eve")
nesting joins (give me all persons with their parents and grand parents)

OK let's dive in.

To use joins, you need to declare a relation which tells how to reach a joined table. Starting from persons:

// Current API for relation declaration - the one that we'll make akin to
// the reference creation API:
let father = ForeignRelation(named: "father", to: "persons", through: ["fatherId": "id"])
let mother = ForeignRelation(named: "mother", to: "persons", through: ["motherId": "id"])
let country = ForeignRelation(named: "birthCountry", to: "countries", through: ["birthCountryId": "id"])

To perform requests, you also need a starting point on which relations apply. Current limitation: you need a TableMapping type. From the starting point, you can include relations (the columns of the joined tables will be selected), or simply join them (the columns of the joined tables will not be selected):

// Give me all persons along with their parents and birth country
let request = Person.include(father, mother, country)

// SELECT persons.*, father.*, mother.*, countries.*
// FROM persons
// LEFT JOIN persons father ON father.id = persons.fatherId
// LEFT JOIN persons mother ON mother.id = persons.motherId
// LEFT JOIN countries birthCountry ON birthCountry.id = persons.birthCountryId
for person in request.fetch(db) {
    person.father
    person.mother
    person.country
}

At the row level:

for row in Row.fetch(db, request) {
    row.value(named: "id")                          // id of the person
    row.variant(named: "father").value(named: "id") // father's id
    row.variant(named: "mother").value(named: "id") // mother's id
    row.variant(named: "birthCountry").value(named: "id") // country id
}

(I'm not happy with the "variant" word here - figuring out the right API for row adapters is bloody hard.)

Loading nested relations:

let request = Person
    .include(
        father.include(father, mother),
        mother.include(father, mother))
// SELECT persons.*, father0.*, father1.*, mother0.*, mother1.*, father2.*, mother2.*
// FROM persons
// LEFT JOIN persons father0 ON father0.id = persons.fatherId
// LEFT JOIN persons father1 ON father1.id = father0.fatherId
// LEFT JOIN persons mother0 ON mother0.id = father0.motherId
// LEFT JOIN persons mother1 ON mother1.id = persons.motherId
// LEFT JOIN persons father2 ON father2.id = mother1.fatherId
// LEFT JOIN persons mother2 ON mother2.id = mother1.motherId
// LEFT JOIN countries birthCountry ON birthCountry.id =
for row in Row.fetch(db, request) {
    // The nesting of row variants follows the nesting of relations:
    row                                     // person
    row.variant("father")                   // person's father
    row.variant("father").variant("father") // person's father's father
    row.variant("father").variant("mother") // person's father's mother
    row.variant("mother")                   // etc.
    row.variant("mother").variant("father")
    row.variant("mother").variant("mother")
}

// Variant nesting fits well with the consumption of row variants by model types:
for person in request.fetch(db) {
    person.father.father
}

Joining table without selecting them:

// Give me all persons named "Bill" whose mother is named "Eve"
//
// SELECT persons.*
// FROM persons
// JOIN persons mother ON mother.id = persons.motherId AND mother.name = 'Eve'
// WHERE persons.name = 'Bill'
let request = Person
    .filter { $0["name"] == "Bill" }
    .join(required: true, mother.filter { $0["name"] == "Eve" })

// With SQLColumn:
let nameColumn = SQLColumn("name")
let request = Person
    .filter { $0[nameColumn] == "Bill" }
    .join(required: true, mother.filter { $0[nameColumn] == "Eve" })

The required argument turns LEFT JOIN into JOIN. Writing this example for you makes me doubt that the current default LEFT JOIN is a good idea (it requires good SQL skills not to forget the required: true above). I also now understand why ActiveRecord does not provide API for filtering on joined tables, but only helps building WHERE clauses - this lets them default to LEFT JOIN while letting people write conditions that do what is expected - hmm hmm hmm (a wave of doubt).... (potential solution: keep the default LEFT JOIN, apply the filter condition on joined tables in the WHERE clause, add another "expert" API to add an ON clause)

The filter closure is new. It allows GRDB to postpone the table aliases generation until the request is translated into SQL. A downside is that it only helps writing conditions on a single table:

// Still supported:
Person.filter(nameColumn == "Bill")
// When there are joins:
Person.filter { $0[nameColumn] == "Bill" }

Last, two examples of joins which are not supported by the query interface and require SQL snippets (and sometimes explicit table aliasing):

// All persons whose mother is named Eve or born in Europe:
Person.
    .join(mother, country)
    .filter(sql: "mother.name = ? OR country.continent = ?", arguments: ["Eve", "Europe"])

// All persons whose mother or grand-mother is named Eve:
Person.
    .join(mother.aliased("mother")
        .join(mother.aliased("grandmother")))
    .filter(sql: "mother.name = ? OR grandmother.name = ?", arguments: ["Eve", "Eve"])

At least it is possible to write such query - at one point eventually raw SQL becomes the best option anyway.

Writing this has shown me that I still have pretty big issues to solve.... Nevertheless, let's at least deal with table creation ;-)

cfilipov commented 8 years ago

GRDB uses the Configuration type for that:

I forgot about configuration. However, I'm not sure the configuration API is the appropriate way to expose the PRAGMA interface. Some pragmas, like enabling foreign key constraints, might be done as part of a migration, and the current API doesn't seem to allow for that.

OK let's talk about the definition of the types of columns. What are our options?

Affinity, as in your above sample code.

Plus side: it's very straight-forward, and the only information that SQLite actually cares about.

Minus side: unless you know SQLite pretty well, you don't even know what affinities are.

As you say, GRDB is for those who prefer to write SQL and work closer with it. In the absence of this table builder interface someone needs to know the column affinities to create a table anyway, and this DSL just wraps in Swift.

An enum for general SQL types: .integer, .double, .date, .time, .datetime, .text, etc. There will be omissions (decimals, varchar(n)) because SQLite don't handle them at all.

Plus side: the intent of storage is crystal clear (when above, it was the usage intent).

Minus side: I don't know.

I think the options .time, .datetime, etc.. are decieving because these are not types as far as SQLite is concerned. Putting them in the table builder interface would only be confusing to someone familiar with SQLite, and it's lying about what actual SQL gets generated.

I prefer to keep the DSL as close to the actual SQL code as possible. The idea is that you know SQL, which is why you would use GRDB, but don't want to use strings.

Anything other than .integer, .text, .double, .blob, .numeric as options would be obfuscating the API for someone who knows SQLite.

I'd vote for SQL types.

My vote goes strongly for affinity. Perhaps using the label type: instead of affinity: might make things more clear for some people, but that would still be lying because columns in SQLite don't have a type, they have an affinity for a type. Making this clear and calling it what it is can avoid confusion and may even help newer developers by giving them a hint that there is more to it than just a "type". As for the accepted input, I think the only correct option here is the actual SQLite affinities.

[...] and defaults to NUMERIC if the type is omitted in the create statement. So it is possible to define a column as simply Column("name").

Wouldn't BLOB be the best affinity for this case? Quoting SQLite doc:

I suggested NUMERIC simply because that's what SQLite will use as default if you omit a type. This is not actually something I was trying to enforce, but rather a lack of action.

A column with affinity BLOB does not prefer one storage class over another and no attempt is made to coerce data from one storage class into another.

Nevertheless, thanks for this really surprising suggestion. I foresee documentation problems, because I don't see myself starting by this case (it doesn't look very serious at first sight), and by the time users have understood typed columns, they won't be much impressed by this unexpected shortcut. What do you think?

If I leave out that param in the DSL I would expect the generated SQL to look like this:

CREATE TABLE foo;

Anything else would be a major surprise (especially blob, which is used for very specific purposes). For this reason, the affinity: param (and all params that are optional parts of the grammar) is optional (as in, Optional<T>). When an optional part of the grammar is nil in the DSL, it is left out in the generated SQL, because this is the way of least surprises.

In such case, SQLite's default will be used, which is NUMERIC. I think it will only make things more confusing to deviate from SQLite's own defaults.

Default "id INTEGER PRIMARY KEY" unless specified otherwise.

I'm strongly against defaulting to INTEGER PRIMARY KEY.

I am going to sound like a broken record here, but again I advocate deviating as little from SQLite as possible. It might be convenient for some cases, but in most cases it is just different that what someone would expect.

The primary key is defined next to the table name. I hope that is helps the reader.

I tried to do this, but remember that you also need to support the optional conflict clause when you specify a primary key. Doing this separate from the primary key param will allow you to specify a primary key conflict clause even if you have not specified a primary key.

unique and null have simple boolean versions that cover the very common use case which does not define explicit conflict clause.

Have a look at my examples again. You can conveniently specify boolean true or false values for these params, or provide a more complete conflict resolution clause instead. I would not want to leave out conflict resolution.

SQL.create(name: "foo", ifNotExists: true)
    .column(
        name: "id",
        affinity: .integer,
        notNull: true)

Perhaps nullable: would be a better name?

Referenced columns default to the primary key of the target table, the most common use case.

This makes assumptions about what the primary key is called. Some people would prefer to use country_id instead of id as the primary key. And again, I choose to lean toward explicitness and closeness to the actual SQL. You don't have to write table creation often, you aren't saving much by making it slightly less verbose.

It's been a while I wanted to tell you that the "auto-incrementing" word should not be used in our API for "INTEGER PRIMARY KEY" columns. SQLite does not recommend the usage of truly auto-incremented columns for performance reasons. Instead, "INTEGER PRIMARY KEY" columns guarantee automatic uniqueness (the most common goal), but do not guarantee incrementation (ids of deleted rows can be reused).

Indeed. I have seen that documentation. And I agree a developer should avoid using it, but that does not mean it should be excluded in the API.

In my example I also provided a way to do this without auto-increment:

SQL.create(name: "foo", ifNotExists: true)
    .column(
        name: "id",
        primaryKey: SQL.onConflict(.abort),
        affinity: .integer,
        notNull: true)

Note that ordering is not relevant unless you specify auto-increment.

I hope you can see that there are some philosophies that I am strongly following in the design:

The DSL should be as close to the SQLite grammar as possible (both structurally and semantically). When you see the generated SQL from your DSL, it should not surprise you.
If it's a default in SQLite, it should remain the default.
If you omit a clause or token in the DSL, it's the same as omitting it in the SQL. This also has the effect of keeping SQLite's defaults.
Don't be magical, don't surprise the user of the API.

When I came up with the DSL, I literally sat in front of the SQLite documentation and translated the grammar piece by piece into Swift. So far my proposal is a nearly 100% complete representation of the SQLite CREATE TABLE grammar.

I think we are moving a good direction. From what I can tell the biggest area of difference between us right now is nearly always about what to do by default. I am consistently against doing anything different from SQLite's default semantics, I hope my arguments above convince you this is the right approach.

cfilipov commented 8 years ago

Some more comments:

defaulting vs defaluts vs default. I like default the most, but the highlighter is annoying and makes it look incorrect. Unfortunately I don't see this getting fixed in any regex-based editor because of the nature of this special case so defaulting would be my next choice.

On the fluent API vs the param label: I think the later reads more clearly, and now that I have experimented with the nearly the whole grammar I can see that there were only those few cases that were difficult to figure out.

I notice you put create on db.schema but I think it would be useful to have it separate like you did with queries so you can just call them with db.execute().

You mentioned having schema methods like create only be available during migrations. You have good intentions but this is an example of how an API creator's good intentions can simply limit the API's usefulness. At some point I want to write a SQLite browser tool in swift that can also create tables. If you make this API only available in migrations such a tool could not use GRDB. This also goes back to my want for being closer to the "bare metal": SQLite will let you create a table at any point in your session, a good SQLite API should not restrict that just because the author did not see value in it.

cfilipov commented 8 years ago

This thread is getting long and there are two parallel conversations. I suggest closing this issue and opening separate ones for join and create discussions.

groue commented 8 years ago

Hello @cfilipov, we're diverging again!

OK let's talk about the definition of the types of columns. What are our options?

In the absence of this table builder interface someone needs to know the column affinities to create a table anyway, and this DSL just wraps in Swift.

[...]

Anything other than .integer, .text, .double, .blob, .numeric as options would be obfuscating the API for someone who knows SQLite.

SQLite column affinities are not well-known. They are designed to remain semi-obscure to regular SQLite users. SQLite column affinities are a tool used by SQLite to improve its compatibility with other database engines, and welcome a very wide gamut of developers by not punishing them if they lack SQLite expertise.

A great use case where affinities show they're not the right tool in designing our API is dates. It happens that GRDB stores dates as strings. It could have stored them as timestamps, or even Julian days, natively supported by SQLite. Choosing strings over timestamps and julian days has been a non-trivial debate. Choosing the precise string format has been a non-trivial debate (SQLite supports two). There are enough people and libraries out there that store dates in another way so that you can understand that "storing dates" is more easily said than done.

If we had only affinities in our table creation DSL, the user would hesitate as soon as he has to store a date. And users could not easily grasp the "storage intent" when reading the migration code:

t.column("date", .string)    // meh
t.column("date", .double)    // this guy didn't read GRDB documentation
t.column("date", .datetime)  // of course, ok let's move on

@cfilipov It's high time you to start criticizing your opinions from the point of view of the user.

and defaults to NUMERIC if the type is omitted in the create statement

I know understand that you did not propose this feature to solve a problem or answer a use case. You did propose this feature in your forced march to reimplement SQLite grammar in Swift.

My interest instantly dropped to zero (more on that below).

I hope you can see that there are some philosophies that I am strongly following in the design:

The DSL should be as close to the SQLite grammar as possible (both structurally and semantically). When you see the generated SQL from your DSL, it should not surprise you.

If it's a default in SQLite, it should remain the default.

If you omit a clause or token in the DSL, it's the same as omitting it in the SQL. This also has the effect of keeping SQLite's defaults.

Don't be magical, don't surprise the user of the API.

SQL is a much better medium to itself than a dry and verbose Swift API that painfully mimicks its target language. My interest for this scholar exercise is zero. My interest goes to designing an API that takes in account the reality of developers' tasks.

BTW I dislike magic just as much as you do.

db.schema.create(table: "persons") { t in
    t.column("name", .text, default: "Anonymous")
    t.column("birthDate", .date)
    t.column("email", .text, unique: true, collate: .nocase)
}

The code above is not that bad at readability. Better than your current proposal, quite obviously. Defaulting to "id INTEGER PRIMARY KEY" is maybe blunt: I'll look at other table creation DSLs to forge a stronger opinion.

db.schema.create(table: "citizenships", primaryKey: ["personId", "countryIsoCode"]) { t in
    t.column("personId", .integer, null: false)
        .references("persons", onDelete: .cascade, onUpdate: .cascade)
    t.column("countryIsoCode", .text, null: false)
        .references("countries", onDelete: .cascade, onUpdate: .cascade)
}

We're using schema inspection to infer the referenced primary keys above (no defaulting to id here). This makes code less verbose and more intent-oriented. The "magic" is quite limited, and does the right thing.

When I came up with the DSL, I literally sat in front of the SQLite documentation and translated the grammar piece by piece into Swift. So far my proposal is a nearly 100% complete representation of the SQLite CREATE TABLE grammar.

Studying the SQLite grammar is great. What you forgot in translating it into Swift is considering the users of your APIs. You had to spoil the mainstream use cases to leave room for less-used grammar corners. Your intellectual skills are already strong, now maybe the time to put them at designing APIs for other developers.

cfilipov commented 8 years ago

Hello @cfilipov, we're diverging again!

I suspect we actually agree closer than you think. Part of the creative process is putting out ideas that might end up being dumb, look at them with an open mind and try not to take everything I say as an attack on your opinion. Likewise I will assume the best intentions from you.

SQLite column affinities are not well-known. They are designed to remain semi-obscure to regular SQLite users. SQLite column affinities are a tool used by SQLite to improve its compatibility with other database engines, and welcome a very wide gamut of developers by not punishing them if they lack SQLite expertise.

I don't understand what you mean by obscure. Every time you create a table you are calling out affinities. Every SQLite tutorial, SQLite's own docs, everything I find mentions these affinities (though they might call them "types"). This is what a table create statement looks like, and I find nothing about it to be obscure:

CREATE TABLE person (
   id INT PRIMARY KEY NOT NULL,
   name TEXT,
);

I am proposing that the DSL follow the same options we are familiar with. Once again, we both agree that writing SQL directly is the best and more powerful way to work with SQLite. My goal with the DSL is to maintain this power but without resorting to working inside strings within Swift.

A great use case where affinities show they're not the right tool in designing our API is dates. It happens that GRDB stores dates as strings. It could have stored them as timestamps, or even Julian days, natively supported by SQLite. Choosing strings over timestamps and julian days has been a non-trivial debate. Choosing the precise string format has been a non-trivial debate (SQLite supports two). There are enough people and libraries out there that store dates in another way so that you can understand that "storing dates" is more easily said than done.

SQLite doesn't natively store dates, you can store dates as strings or an integers and SQLite comes with some built in functions to make them easier to work with. But they are just strings and integers (depending on what you choose). As you mentioned earlier, the storage and retrieval of dates is controlled by the query API. Marking this table as a date has no effect on how we actually import the data, and even worse, it lies to the user about what kind of CREATE TABLE statement gets generated.

If we had only affinities in our table creation DSL, the user would hesitate as soon as he has to store a date. And users could not easily grasp the "storage intent" when reading the migration code:
   t.column("date", .datetime) 

This will result in a create statement that generates a TEXT column, not a DATETIME column, unless I'm missing something from SQLite's docs. Even worse, perhaps the user wants to store the date as a unix epoch time and would prefer INTEGER, and we didn't let them express this.

I know understand that you did not propose this feature to solve a problem or answer a use case. You did propose this feature in your forced march to reimplement SQLite grammar in Swift.

The use case is for a developer that knows SQLite and wants to interface directly with SQL, but doesn't want to do it with strings. This isn't a forced march, it was simple and intuitive to implement, and I find it to be simple to use because of the similarity. I have used some SQL DSLs before and the ones I didn't like tried too hard to be original and fancy and diverged enough from SQL that you needed to re-learn everything in order to use them.

I think someone who's used to writing SQL should be able to easily use the DSL after learning the builder syntax, and not have to refer to documentation every time because it diverges from what SQL does for a similarly-named part.

SQL is a much better medium to itself than a dry and verbose Swift API that painfully mimics its target language. My interest for this scholar exercise is zero. My interest goes to designing an API that takes in account the reality of developers' tasks.

I am surprised that you described GRDB as a tool for those who want to interface directly with SQLite, who prefer to work with SQL, and then claim that staying too close to SQLite's semantics is the wrong approach. My intent is to make it usable and intuitive by following patterns one would expect from a SQL DSL.

db.schema.create(table: "persons") { t in
    t.column("name", .text, default: "Anonymous")
    t.column("birthDate", .date)
    t.column("email", .text, unique: true, collate: .nocase)
}
The code above is not that bad at readability. Better than your current proposal, quite obviously. Defaulting to "id INTEGER PRIMARY KEY" is maybe blunt: I'll look at other table creation DSLs to forge a stronger opinion.

I agree that this form is likely the right way to go. What I like about this is the fact that each column is a single statement compared to doing it without the closure (where the whole thing is a single expression and we'd rely on a method to return a column builder to start a new column).

db.schema.create(table: "citizenships", primaryKey: ["personId", "countryIsoCode"]) { t in
    t.column("personId", .integer, null: false)
        .references("persons", onDelete: .cascade, onUpdate: .cascade)
    t.column("countryIsoCode", .text, null: false)
        .references("countries", onDelete: .cascade, onUpdate: .cascade)
}

I still think that primary key should be done outside of the create() params so that handling conflict and other optional clauses can be done fluently:

db.schema.create(table: "citizenships") { t in
    t.column("personId", .integer, null: false)
        .references("persons", onDelete: .cascade, onUpdate: .cascade)
    t.column("countryIsoCode", .text, null: false)
        .references("countries", onDelete: .cascade, onUpdate: .cascade)
    t.primaryKey(columns: "personId", "countryIsoCode") // optionally specify conflict clause etc...
}

Studying the SQLite grammar is great. What you forgot in translating it into Swift is considering the users of your APIs. You had to spoil the mainstream use cases to leave room for less-used grammar corners.

You like to make many board generalizations in your critiques but I think it would help to be more specific and technical. As I said, I have been consistent in targeting the user who like SQL and want to continue to write SQL but avoid working in strings. To that end I have proposed an API that focuses on them.

Please be specific, what mainstream use cases am I spoiling? It's hard to argue general points when you do not call out a specific part of the API.

groue commented 8 years ago

SQLite affinities

Before SQLite was SQL, and the SQL standard. SQL defines various column types, that standard-compliant SQL implementors have to deal with, in a way or another:

CREATE TABLE persons (
    id INTEGER,          -- 1
    name VARCHAR(255),   -- 'Arthur'
    birthDate DATE       -- 1987-09-18
    lastUpdate DATETIME, -- 2016-07-05 19:43
    picture BLOB         -- <data>
)

INTEGER, DATETIME and BLOB are SQL types. There are many, and the list is extending. Database engines implement them in various and sometimes inconsistent ways. At the time SQLite was designed, the major databases where Oracle, SQLServer, Sybase, MySQL, PostgreSQL, and I surely overlook some important ones. Dwayne Richard Hipp, the author of SQLite, decided that it was important that his tiny SQL library would be mostly compatible with them. Quoting https://www.sqlite.org/datatype3.html:

Most SQL database engines (every SQL database engine other than SQLite, as far as we know) uses static, rigid typing. With static typing, the datatype of a value is determined by its container - the particular column in which the value is stored.

SQLite uses a more general dynamic type system. In SQLite, the datatype of a value is associated with the value itself, not with its container. The dynamic type system of SQLite is backwards compatible with the more common static type systems of other database engines in the sense that SQL statements that work on statically typed databases should work the same way in SQLite (emphasis mine).

And indeed the way SQLite deals with SQL types is funny. It is a mix of Storage classes, C types, and Affinities.

There are five storage classes: int64, double, text, blob, and null. These are the actual data stored in the SQLite file. A single column may contain values of multiple storage classes.

There are several supported C types: void*, double, int, int64, char*, and others. These are actual types of arguments in the SQLite C API.

There are five affinities: TEXT, NUMERIC, INTEGER, REAL, BLOB. They are the bridge between C types and storage classes: affinities define which storage class will be used depending on the C type of the value stored by the SQLite user.

For example, when you store the C int 1 into a column declared with the SQL type INT and thus of affinity INTEGER, SQLite stores the int64 1. So far, so good.

Store the C int 1 into a column declared with the SQL type DOUBLE PRECISION and thus of affinity REAL, SQLite stores the double 1.0.

Store the C int 1 into a column declared with the SQL type VARCHAR(255) and thus of affinity TEXT, SQLite stores the string "1".

The last example is odd, of course - yet it is real, even if I doubt it is much used. I wanted to show you the strange beasts affinities are. Not as concrete as storage classes and C types, affinities are algorithms.

For more details, see https://www.sqlite.org/datatype3.html. It's very interesting.

Back to our topic. Storage classes, affinities, and C types are how SQLite implement the SQL types defined by the SQL standard. An SQLite expert needs to know about them, of course. Still, the types that are designed to be user-facing are SQL types: SQLite gives us this list:

INT
INTEGER
TINYINT
SMALLINT
MEDIUMINT
BIGINT
UNSIGNED BIG INT
INT2
INT8
CHARACTER(20)
VARCHAR(255)
VARYING CHARACTER(255)
NCHAR(55)
NATIVE CHARACTER(70)
NVARCHAR(100)
TEXT
CLOB
BLOB
REAL
DOUBLE
DOUBLE PRECISION
FLOAT
NUMERIC
DECIMAL(10,5)
BOOLEAN
DATE
DATETIME

This list is much too much too long for our purpose, of course. Yet: when the user creates a table, we'll stick to the data types that the standard has designated as the types used to create a table. And those are SQL Types. Not affinities, not storage classes, not C types. Quoting our beloved Swift 3 API Design Guidelines: Don’t surprise an expert, don’t confuse a beginner.

Thus the list of types of our DSL could be: integer, text (maybe better than string), blob (maybe better than data), double, boolean (maybe better than bool), date, datetime, and maybe numeric (so that we can expose the numeric affinity, eventually).

End of the course about SQLite affinities ;-)

SQLite doesn't natively store dates, you can store dates as strings or an integers and SQLite comes with some built in functions to make them easier to work with. But they are just strings and integers (depending on what you choose). As you mentioned earlier, the storage and retrieval of dates is controlled by the query API. Marking this table as a date has no effect on how we actually import the data, and even worse, it lies to the user about what kind of CREATE TABLE statement gets generated.

And now you know that t.column("date", .datetime) will generate the date DATETIME sql. This is the SQL expected by experts, because DATETIME is the recommended SQL type for dates. This is the SQL that the beginners should see so that they gain (maybe) some useful experience for their future encounters with static database engines, and don't try to oversmart their database by storing text or timestamps instead (as a matter of fact, it would be stupid in most database engines).

The DATETIME SQL type has the affinity NUMERIC. That affinity is the most funny of all, in that it tries to convert everything to numbers, but preserves values that can't be converted: it will store 30000 for the string "3.0e+5", "foo" for "foo", and... "2016-07-05" for "2016-07-05". That means that it is compatible with the fact that GRDB stores dates as strings. But one could also store timestamps, no problem. Oh, SQLite!

You like to make many board generalizations in your critiques but I think it would help to be more specific and technical.

Yes, I wanted to shake you a little :-) But you're right, I'll do what you ask for the best I can. Plus you have made very valid observations. See you there in a couple of hours!

cfilipov commented 8 years ago

... For more details, see https://www.sqlite.org/datatype3.html. It's very interesting.

Thank you for the explanation, this clears things up a lot. I misunderstood the grammar as accepting an affinity instead of a type, when in reality SQLite accepts all the types you would expect in a SQL create statement. I think what makes this even more confusing is that many tutorials and stackoverflow answers encourage the use of only types that map directly to one of the SQLite storage classes (as in, sticking to the types "TEXT", "INTEGER", etc...). I've never seen anyone create a "DATETIME" column in SQLite, even though it's valid.

After reviewing the docs you linked I can see why I was wrong about exposing affinity and I now agree that SQL types would be the most appropriate. The question is, what types should be exposed? Everything (for completeness)? If not everything, then what subset would be appropriate?

Don’t surprise an expert, don’t confuse a beginner.

Very much agreed.

Question:

Several parts of the CREATE TABLE grammar accept an expression. One example of this is the CHECK clause. For (a contrived) example:

CREATE TABLE foo ( id INTEGER PRIMARY KEY CHECK ( id > 10 ) )

This can be represented in the DSL like so:

try db.create(table: "foo") { t in
    t.column(id, type: .integer)
        .check(id > 10)
}

Conveniently, I was able to re-use the _SQLExpressible type from your query builder. However, the query builder was designed with the intent of parameterizing all arguments so this is the generated SQL:

CREATE TABLE foo ( id INTEGER PRIMARY KEY CHECK ( id > ? ) )

Unfortunately, this can't be done in the create statement because you can't use parameters there:

SQLite error 1 with statement `CREATE TABLE foo ( id INTEGER PRIMARY KEY CHECK ("id" > ?) )`: parameters prohibited in CHECK constraints

Of course, the simple workaround is to just use inline SQL:

try db.create(table: "foo") { t in
    t.column(id, type: .integer)
        .check(sql: "id > 10")
}

I can see a few ways to make this work, but I suspect there may be problems. What are your thoughts?

cfilipov commented 8 years ago

Regarding your joins comment:

(I'm not happy with the "variant" word here - figuring out the right API for row adapters is bloody hard.)

I agree about the word variant. I can't think of a much better term. How about scope?

I like the way the API looks so far. What about selecting columns? Perhaps I want to have an aggregate column to return the max age of all grandparents, for example. This would of course require it to be done at the row level or perhaps even requiring one to define a TableMapping for that specific join result.

groue commented 8 years ago

Hello @cfilipov,

Let's compare our two present options, last time we attempted at writing actual code:

// fluent/builder/params
db.execute(SQL.create(table: "foo", ifNotExists: true)
    .column(
        name: "id", 
        affinity: .integer, 
        notNull: true,
        primaryKey: true), 
    .column(name: "int", affinity: .integer)
    .column(
        name: "bar_id", 
        affinity: .integer,
        references: SQL.foreignKey(
            table: "bar",
            onDelete: .cascade,
            onUpdate: .cascade
            deferrable: true,
            columns: "id"))
}

// closure, one statement by column
db.schema.create(table: "foo", ifNotExists: true) { t in
    t.column("name", .integer)
    t.column("bar_id", .integer)
        .references("bar", onDelete: .cascade, onUpdate: .cascade, deferrable: true)
}

I'm happy that you aggree that one statement per column is the right way to go. For me it's just because a huge expression looks impressive, and I don't like to write impressive code unless I really have to :-)

You were writing: "You mentioned having schema methods like create only be available during migrations. You have good intentions but [...]". I agree. I'll try below without even schema: db.create(table: ...).

OK, let's go:

Table creation

extension Database {
    func create(table tableName: String, temporary: Bool = false, ifNotExists Bool = false, body: (t: SQLTableBuilder) -> ())
}

db.create(table: "foo") { t in ... }
db.create(table: "foo", temporary: true, ifNotExists: true) { t in ... }

This gives us:

[X] CREATE TABLE name (...)
[X] CREATE TEMPORARY TABLE name (...)
[X] CREATE TABLE name IF NOT EXISTS (...)

I was planning to give the primary key columns here, and you answered "but remember that you also need to support the optional conflict clause when you specify a primary key" - OK no primary key columns in this method. Conflict clauses on primary keys are rare, but useful: let's keep them in the scope of our DSL.

Columns (basics)

struct SQLTableBuilder {
    func column(_ name: String, _ type: SQLType, ...)
}

enum SQLType: String {
    case text = "TEXT"
    case integer = "INTEGER"
    case double = "DOUBLE"
    case numeric = "NUMERIC"
    case boolean = "BOOLEAN"
    case blob = "BLOB"
    case date = "DATE"
    case datetime = "DATETIME"
}

db.create(table: "person") { t in
    t.column("name", .text)
    t.column("age", .integer)
}

Not naming the parameters of column(_:_:) contradicts a few Swift 3 API guidelines, but the very first one: clarity at the point of use.

[X] CREATE TABLE persons (name TEXT, age INTEGER)

Columns (nullability)

Let's look at various options:

column("name", .text, null: ...)
column("name", .text).null(...)
column("name", .text, constraints: [.notNull])

I'm not convinced by the options set at all, and the fluent way has more punctuation: I'll try with a parameter. Yet we'll maybe have to look back at this choice later.

The SQLite grammar accepts a conflict clause on a NOT NULL qualifier. Your idea of onNull option looks good:

struct SQLTableBuilder {
    func column(
        _ name: String,
        _ type: SQLType,
        null: Bool = true,
        onNull: SQLConflict = .abort
        ...)
}

enum SQLConflict : String {
    case rollback = "ROLLBACK"
    case abort = "ABORT"
    case fail = "FAIL"
    case ignore = "IGNORE"
    case replace = "REPLACE"
}

db.create(table: "person") { t in
    t.column("name", .text, null: false)
    t.column("age", .integer, null: false, onNull: .rollback)
}

The onNull conflict is ignored unless the null parameter is false. Since abort is the default conflict resolution algorithm, we can avoid generating a confict clause for this specific value:

[X] CREATE TABLE persons (name TEXT NOT NULL)
[X] CREATE TABLE persons (name TEXT NOT NULL ON CONFLICT ROLLBACK)

I give another option below, but I like your onNull extra parameter more.

t.column("name", .text, null: .allow)
t.column("name", .text, null: .disallow)
t.column("name", .text, null: .disallowWithConflict(.rollback))

Columns (unique)

Just as NOT NULL, UNIQUE is an optional qualifier with an optional conflict clause. We must be consistent with the null/onNull pair:

struct SQLTableBuilder {
    func column(
        _ name: String,
        _ type: SQLType,
        null: Bool = true,
        onNull: SQLConflict = .abort
        unique: Bool = false,
        onDuplicate: SQLConflict = .abort
        ...)
}

enum SQLConflict : String {
    case rollback = "ROLLBACK"
    case abort = "ABORT"
    case fail = "FAIL"
    case ignore = "IGNORE"
    case replace = "REPLACE"
}

db.create(table: "person") { t in
    t.column("name", .text, unique: true)
    t.column("age", .integer, unique: true, onNull: .rollback)
}

[X] CREATE TABLE persons (name TEXT NOT NULL)
[X] CREATE TABLE persons (name TEXT NOT NULL ON CONFLICT ROLLBACK)

Your initial idea about unicity was fluent: column(...).unique(onConflict: .rollback). Not bad, but we can't be fluent at some places, and use a parameter elsewhere, when dealing with the two very similar constructs that are NOT NULL and UNIQUE.

Should we eventually pick a fluent interface, the nullability should be updated accordingly:

db.create(table: "person") { t in
    t.column("name", .text).notNull().unique()  // default conflict: abort
    t.column("age", .integer.notNull(onConflict: .rollback).unique(onConflict: .rollback)
}

Why not, eventually? Let's keep on.

Columns (default value)

struct SQLTableBuilder {
    func column(
        _ name: String,
        _ type: SQLType,
        null: Bool = true,
        onNull: SQLConflict = .abort
        unique: Bool = false,
        onDuplicate: SQLConflict = .abort,
        default defaultValue: DatabaseValueConvertible,
        ...)
}

db.create(table: "person") { t in
    t.column("name", .text, default: "Anonymous")
}

Easier said than done, though: just like the CHECK constraints you have already studied, ? placeholders aren't accepted in default clauses :-(. Well, I guess we'll have to perform our own escaping... Do you see another option?

The actual SQLite grammar accepts CURRENT_TIME, CURRENT_DATE and CURRENT_TIMESTAMP as default values. OK, seriously, do we really have to support them, when inserting a date created in Swift has the same effect? Not supporting them lets us use the simple DatabaseValueConvertible protocol, and throw any value in. Should we support the special literals, we'd have to write t.column("date", .datetime, default: .value("Anonymous")) just for t.column("date", .datetime, default: .current_time) :-/

[X] CREATE TABLE persons (name TEXT NOT NULL DEFAULT 'Anonymous')

OK, a little pause. On the remaining list:

Columns:

[ ] name SQLType PRIMARY KEY
[ ] name SQLType PRIMARY KEY ASC/DESC
[ ] name SQLType PRIMARY KEY AUTOINCREMENT
[ ] name SQLType CHECK ...
[ ] name SQLType COLLATE ...
[ ] name SQLType REFERENCES ... ON ...

Table constraints:

[ ] PRIMARY KEY (...)
[ ] PRIMARY KEY (...) ON CONFLICT ...
[ ] UNIQUE KEY (...)
[ ] UNIQUE KEY (...) ON CONFLICT ...
[ ] CHECK
[ ] FOREIGN KEY ...

Other:

[ ] DROP TABLE
[ ] CREATE/DROP INDEX
[ ] ALTER TABLE ADD COLUMN
[ ] covering indexes

Table alterations:

[ ] ALTER TABLE ADD COLUMN ...
[ ] complex alterations (column dropping, column replacement)

Open questions:

[ ] raw SQL snippets?
[ ] decide on an eventual default id INTEGER PRIMARY KEY

groue commented 8 years ago

Regarding your joins comment:

(I'm not happy with the "variant" word here - figuring out the right API for row adapters is bloody hard.)

I agree about the word variant. I can't think of a much better term. How about scope?

That's an excellent suggestion!!! I should become a native English speaker some day.

If I try to rewrite the row adaper documentation, we get:

let authorMapping = ColumnMapping(["id": "authorID", "name": "authorName"])
let adapter = ScopedAdapter(scopes: ["author": authorMapping])

for row in Row.fetch(db, sql, adapter: adapter) {
    if let authorRow = row.scoped(on: "author") { ... }
}

Pretty cool! A big thank you! And it fits well the join context as well :-)

I like the way the API looks so far. What about selecting columns? Perhaps I want to have an aggregate column to return the max age of all grandparents, for example. This would of course require it to be done at the row level or perhaps even requiring one to define a TableMapping for that specific join result.

I don't have any answer yet.

About aggregates, I have bookmarked this link as food for thought. It reads:

I don't know about active record but with Django it's simply:
Post.objects.all().annotate(Count('comments'))
Which produces (simplified, Django would actually explicitly select each column):
SELECT posts.*, COUNT(comments.id) AS comments_count
FROM posts
LEFT OUTER JOIN comments ON (posts.id = comments.post_id) GROUP BY posts.id
Nice, easy and without 1+N queries.

This is my only contact so far with a DSL which deals with such a use case.

And selecting columns, even without aggregates, is still on the TODO list as well. It's still a long way!

groue commented 8 years ago

This thread is getting long and there are two parallel conversations. I suggest closing this issue and opening separate ones for join and create discussions.

Yes, maybe it is the right time for that, now.

groue commented 8 years ago

BTW - I'm having one week of vacations starting tomorrow - I won't be there much. Have fun on our migration DSL meanwhile :-)

cfilipov commented 8 years ago

Enjoy your vacation! I've been working on responses to both the join and create topics and will continue those discussions as new, separate issues. So I will close this one and refer to it in the new issues. I should have some replies waiting for you when you get back. Enjoy.

zmeyc commented 8 years ago

@cfilipov

defaulting vs defaluts vs default. I like default the most, but the highlighter is annoying and makes it look incorrect. Unfortunately I don't see this getting fixed in any regex-based editor because of the nature of this special case so defaulting would be my next choice.

Apple itself is using keywords such as 'in' in argument labels. For example, check "Refined API Naming" section on their website: https://developer.apple.com/swift/ The label 'in' in the sample code they posted is highlighted as keyword.