QueryBuilder features - Githubissues

tempestphp / tempest-framework

The PHP framework that gets out of your way 🌊

https://tempestphp.com

MIT License

994 stars 71 forks source link

QueryBuilder features #103

Closed brendt closed 2 months ago

brendt commented 8 months ago

After the refactor for #80 and sleeping a night on the idea, I think we should discuss adding query builder functionality to the Query class, so that people who really don't want to write SQL in their migrations can fall back to using a simpler API.

The biggest issue to consider though is different SQL dialects, and how to deal with them. How much syntax do we want to support? The upside of migrations returning a Query is that we can easily fall back to SQL if we really need something specific to one dialect, meaning that our query builder doesn't have to be perfect. It can cover the 90% common use case, and already be of value.

Let's discuss this further.

yassiNebeL commented 8 months ago

Hey @brendt, a POC implementation is, to have Query, QueryBuilder, Syntax and Connection classes. that's a very good starting point.

let me know if this fits to Tempest so that I can give a decent shot!

brendt commented 8 months ago

@yassiNebeL appreciate that you're interested in tackling this! The Connection side should already be handled by the Database class with is a wrapper around PDO.

I'm ok introducing QueryBuilder, I think that Dialect or Grammar are terms often used for what you mention as Syntax. Another important feature is DataType, which represents data types as described in eg. MySQL (https://dev.mysql.com/doc/refman/8.0/en/data-types.html) or SQLite: https://www.sqlite.org/datatype3.html

One final note is that, for convenience, I want the Query class to have the QueryBuilder methods. Either Query can have an internal reference to QueryBuilder, but I rather introduce an interface QueryBuilder, with a trait called BuildsQueries, which the Query class can use and implement.

Does that make sense to you? Feel free to share more feedback :)

brendt commented 8 months ago

I also feel that we need to talk about what the public API of such a query builder should look like. What are your thoughts?

yassiNebeL commented 8 months ago

One final note is that, for convenience, I want the Query class to have the QueryBuilder methods. Either Query can have an internal reference to QueryBuilder, but I rather introduce an interface QueryBuilder, with a trait called BuildsQueries, which the Query class can use and implement.

Deal with it, since the following approach cannot breaks the processus of delegating operations to the Query object (correct me if I'm wrong).

I also feel that we need to talk about what the public API of such a query builder should look like. What are your thoughts?

while the Builder will handle the relationships part, I'm thinking of smth similar to this:


$books = Book::with(['author', 'chapters'])
                       ->where('ISBN', $value)
                       ->orderby('name')
                       ->get();

the real question here is, how you can deal with NoSQL databases like MongoDB?

brendt commented 8 months ago

Deal with it, since the following approach cannot breaks the processus of delegating operations to the Query object (correct me if I'm wrong).

Not sure I understand, can you elaborate?

while the Builder will handle the relationships part, I'm thinking of smth similar to this:

I think the first step is to get it to work with queries, and then worry about model-specific support :)

the real question here is, how you can deal with NoSQL databases like MongoDB?

We don't :)

yassiNebeL commented 8 months ago

I want the Query class to have the QueryBuilder methods. Either Query can have an internal reference to QueryBuilder, but I rather introduce an interface QueryBuilder, with a trait called BuildsQueries, which the Query class can use and implement.

delegate unimplemented methods to the "query" object with the "__call" magic method.

Model > Builder > Query , I will take the orderBy as an example the Builder class doesn't implement it, but the Query class does.

But I think it should be the opposite, Query have a trait which the Builder uses, so the Builder particularly concerned by the model, while the Query will focus on the grammar.

@brendt what if we summarize the methods first?

aidan-casey commented 8 months ago

@yassiNebeL - Keep in mind that we are not building an Eloquent equivalent here. The general philosophy of Tempest requires a slightly different approach that utilizes the typing PHP gives us.

@brendt - It seems like "query" in this context should actually be a separate object that is the representation of what the builder assembles. Then it can be passed off to something else to translate from objects to SQL. Builder in this sense just becomes a factory class.

This gets us back into DBAL territory where we just need to be really careful to plan this out well.

yassiNebeL commented 8 months ago

I didn't mention Eloquent through this issue, and yes it uses the builder approach along with doctrine. Also, I bare in mind that the Tempest model doesn't look like an Elqouent model neither a Symfony model.

@aidan-casey healthy debate is an essential component of a vibrant

mokhosh commented 8 months ago

This was an interesting video, I recommend we all watch it: https://www.youtube.com/watch?v=qScVFyOGL_Y
I suggest we brainstorm some ideas about the public API, and discuss pros and cons.
I like the idea of not reimplementing eloquent and other already available ORMs. As Brent has mentioned in the lives, tempest has the huge benefit of being free to explore ideas, we should put that to use.
If we make it driver based, we can start with only supporting mysql or sqlite, and then people can just create drivers, and they only have to pass the tests.

brendt commented 8 months ago

It's great to see us brainstorming here, awesome!

Let's tackle this problem the same way I've been doing everything for Tempest: we start from the public API, and work our way backwards towards the framework. The framework will need to find a way to make it work, because we're not having our users pay for the framework's convenience.

That being said, I think we need to focus on two areas when considering a query builder:

Migrations
Model queries

Right now, we have a Query class that represents a data object with SQL and bindings: https://github.com/tempestphp/tempest-framework/blob/main/src/Database/Query.php

(For backwards compatibility reasons, this class also exposes the execute, fetch and fetchFirst methods that are piped to the underlying Database dependency, but we need to refactor those out. This is just a remnant from me building the very first iteration of model support.)

Let's look at migrations for a second. Right now, migrations should return a query object, which is essentially plain SQL. It's the easiest solution so that we don't have to worry about different databases.

Let's say we add a query builder; migrations could still return a query, whether it's manually defined or "built" by the query builder shouldn't matter:

public function up(): Query|null
{
    return new Query("CREATE TABLE Author (
        `id` INTEGER PRIMARY KEY AUTOINCREMENT,
        `name` TEXT NOT NULL
    )");
}

public function up(): Query|null
{
    return QueryBuilder::create('Author')
        ->id()
        ->text('name')
        ->build();
}

Could both be possible.

(note that the querybuilder syntax shown above doesn't represent any final API, I just had to come up with something.)

Taking it a step further, we could add convenience methods on models to do something like this:

public function up(): Query|null
{
    return Author::table()
        ->id()
        ->text('name')
        ->build();
}

But don't get distracted by this second step. We'll first need to decide what this query builder will look like. Afterwards we can discuss where we add convenience methods to work with it more easily.

So, what's on the table right now? The public API of the query builder class.

There are numerous possibilities, let me list a handful.

For table creation queries:

QueryBuilder::create('Book')
    ->id()
    ->text('name')
    ->dateTime('publishedAt', nullable: true)
    ->reference(Author::class)
    ->build();

QueryBuilder::new()
    ->create('Book')
    ->add(new IdDataType())
    ->add(new DateTimeDataType('publishedAt', nullable: true))
    ->add(new Reference(Author::class))
    ->build();

For alter queries:

QueryBuilder::new()
    ->alter('Book')
    ->dateTime('archivedAt', nullable: true)
    ->drop('publishedAt')
    ->build();

QueryBuilder::table('Book')
    ->alter()
    ->dateTime('archivedAt', nullable: true)
    ->drop('publishedAt')
    ->build();

For select queries:

QueryBuilder::select('*')
    ->from(Author::class)
    ->join(Book::class)
    ->where('Book.publishedAt > ?', new DateTimeImmutable('-5 days'))
    ->build();

QueryBuilder::new()
    ->select('*')
    ->from('Author')
    ->join('Book', 'Book.author_id', '=', 'Author.id')
    ->where('Book.publishedAt > ?', new DateTimeImmutable('-5 days'))
    ->build();

Ok, I think everyone gets the point, there are tons of possibilities here. We need to decide on the best one. Chances are "the best one" isn't in what I've listed above, we'll need to come up with something together.

And saving this for the last: I firmly believe that we need to pick one approach and stick with it, instead of trying to make this query builder smart enough to do everything. I encourage everyone to read this post I wrote a while ago: https://stitcher.io/blog/opinion-driven-design

I prefer one way of solving a problem, instead of offering several options. As an open source maintainer, I realise that not everyone might like the solutions I come up with as much as I do; but in the end, if the job gets done, if my code is reliable, clear and useful; there rarely are any complaints. So I started to prefer opinion-driven design when I realised that flexibility comes with a price that is often not worth paying.

We are starting from a blank slate. I want to urge everyone thinking about this problem to keep that in mind. We don't build what we already know, we build what we believe is the best solution. If we firmly believe one approach is the best one, then we'll find a way for the framework to follow.

brendt commented 8 months ago

PS: thanks @mokhosh for the video, I'll definitely watch it tomorrow 👍

mokhosh commented 8 months ago

here's an idea

what if we don't have migration files. we just have a migrate command that looks at your model and creates/alters the table to match your model.

that is so tempest

SanderMuller commented 8 months ago

here's an idea

what if we don't have migration files. we just have a migrate command that looks at your model and creates/alters the table to match your model.

that is so tempest

Symfony style? Great for development, scary for live databases.

brendt commented 8 months ago

Interesting idea :) In my experience though, this method quickly falls apart. Migrations are often used for more things than reflecting models in the database:

Adding carefully chosen indices for performance optimisations
Adding virtual columns and tables, or stored procedures to leverage your database capabilities
Dropping tables and columns over the course of the project's lifespan

So, auto-generating migrations won't suffice

mokhosh commented 8 months ago

Symfony style? Great for development, scary for live databases.

I think anything touching your live database is scary anyway. What do you think makes it more scary though? In one case you have to describe the changes, and in the other you just describe the end result. Either way your database has to change. And in both cases you will have to have tests and everything to make sure you're not breaking things.

Interesting idea :) In my experience though, this method quickly falls apart. Migrations are often used for more things than reflecting models in the database:

Adding carefully chosen indices for performance optimisations

Adding virtual columns and tables, or stored procedures to leverage your database capabilities

Dropping tables and columns over the course of the project's lifespan

So, auto-generating migrations won't suffice

That's a very good point you brought up, cause I think we need to discuss that for other parts of Tempest too.

I think there are two ends of the spectrum. One is to make everything easy and simple and flexible and implicit and smart, which makes it great for getting up and running quickly, and is amazing for small to medium projects. The other end of the spectrum is to make everything verbose and explicit and raw, which makes it more suitable for larger projects where you might not care as much about getting up and running in a day and prototyping as you care about long term maintenance and new developer onboarding and avoiding implicit knowledge etc.

I don't think either way of having the migrations makes it fall apart, it is how you structure your project and its development

About the points you mentioned

We can have attributes for indices.
That's a very niche requirement, it's not even in the 10% or even 1%, and even in Laravel you would have to write raw SQL for stored procedures. Trying to put this in the automatic discovery model would go against your philosphy.
Dropping tables and columns in included in the automatic discovery and diffing.

Please do elaborate why you think this will fall apart quickly, maybe we can get around the issues and still offer a nice developer experience. Also, even if we have automatic migrations based on models, there's nothing stopping us from offering a more explicit and verbose way of defining migrations with full control. It's just a command.

(For backwards compatibility reasons, this class also exposes the execute, fetch and fetchFirst methods that are piped to the underlying Database dependency, but we need to refactor those out. This is just a remnant from me building the very first iteration of model support.)

I also have a philosophical note about this. I think everyone understands that Tempest is in its very early stages, and version number clearly indicates that. So, while it's thoughtful of you to think about backwards compatibility, I would say it's very early to let anything tie you down. I would say let's be wild, let's keep exploring, let's build things and tear them down without worry. Let's only settle on an API when no one can wish for anything more interesting.

SanderMuller commented 8 months ago

Symfony style? Great for development, scary for live databases.

I think anything touching your live database is scary anyway. What do you think makes it more scary though? In one case you have to describe the changes, and in the other you just describe the end result. Either way your database has to change. And in both cases you will have to have tests and everything to make sure you're not breaking things.

I have once done an Entity (Model) change which caused the "auto migrate' feature to rename the column by dropping it and recreating it, resulting in an empty table. You will want to see what changes it will execute instead of knowing what the new situation will be based on your Models.

We could still let the framework auto generate the migrations for us, so we can review the actual changes, similar to https://symfony.com/doc/current/doctrine.html#migrations-creating-the-database-tables-schema

Interesting idea :) In my experience though, this method quickly falls apart. Migrations are often used for more things than reflecting models in the database:

Adding carefully chosen indices for performance optimisations

Adding virtual columns and tables, or stored procedures to leverage your database capabilities

Dropping tables and columns over the course of the project's lifespan

So, auto-generating migrations won't suffice

That's a very good point you brought up, cause I think we need to discuss that for other parts of Tempest too.

I think there are two ends of the spectrum. One is to make everything easy and simple and flexible and implicit and smart, which makes it great for getting up and running quickly, and is amazing for small to medium projects. The other end of the spectrum is to make everything verbose and explicit and raw, which makes it more suitable for larger projects where you might not care as much about getting up and running in a day and prototyping as you care about long term maintenance and new developer onboarding and avoiding implicit knowledge etc.

I don't think either way of having the migrations makes it fall apart, it is how you structure your project and its development

About the points you mentioned

We can have attributes for indices.

That's a very niche requirement, it's not even in the 10% or even 1%, and even in Laravel you would have to write raw SQL for stored procedures. Trying to put this in the automatic discovery model would go against your philosphy.

Dropping tables and columns in included in the automatic discovery and diffing.

Please do elaborate why you think this will fall apart quickly, maybe we can get around the issues and still offer a nice developer experience. Also, even if we have automatic migrations based on models, there's nothing stopping us from offering a more explicit and verbose way of defining migrations with full control. It's just a command.

I do like the idea for attributes for indices / primary keys etc, again very similar to Symfony https://symfony.com/doc/current/doctrine.html#creating-an-entity-class

The Symfony style Entities have always seemed superior to me than the Laravel Models (just like the Symfony Forms, luckily Laravel now has Filament for that)

brendt commented 8 months ago

@mokhosh

I think there are two ends of the spectrum. One is to make everything easy and simple and flexible and implicit and smart, which makes it great for getting up and running quickly, and is amazing for small to medium projects. The other end of the spectrum is to make everything verbose and explicit and raw, which makes it more suitable for larger projects where you might not care as much about getting up and running in a day and prototyping as you care about long term maintenance and new developer onboarding and avoiding implicit knowledge etc.

To be clear: Tempest is targeting small and medium projects, not big projects.

I also have a philosophical note about this. I think everyone understands that Tempest is in its very early stages, and version number clearly indicates that. So, while it's thoughtful of you to think about backwards compatibility, I would say it's very early to let anything tie you down. I would say let's be wild, let's keep exploring, let's build things and tear them down without worry. Let's only settle on an API when no one can wish for anything more interesting.

Another point of clarification: I meant that it was too much refactoring work for me to do in one hour, so I decided to do it somewhere next week. So either I had to hold off on pushing my changes, or do it in two steps :) That's the only thing I meant with "backwards compatibility"

About the attributes for indices both @mokhosh and @SanderMuller mention. In my opinion, model classes shouldn't be concerned with database configuration. I talked about it on stream here: https://www.youtube.com/live/jgN-sT4raiI?si=Y9gRg6NUKuXOLeeb&t=1625 (from 27:00 to 30:00). For me, this is a core design principle within Tempest that makes it stand out: models and database are disconnected (as much as possible). Models are classes, they don't care whether their data comes from a database, a JSON file, an API, … it doesn't matter. You're free to make your case explaining why I'm wrong in my thinking — I'm open for that feedback, but up until this point I haven't read a good argument to step away from that view :)

mokhosh commented 8 months ago

I have once done an Entity (Model) change which caused the "auto migrate' feature to rename the column by dropping it and recreating it, resulting in an empty table. You will want to see what changes it will execute instead of knowing what the new situation will be based on your Models.

This is an easy fix. We can simply have a --dry-run option to show you what will exactly change, before you actually migrate.

mokhosh commented 8 months ago

To be clear: Tempest is targeting small and medium projects, not big projects.

That's great. And it makes me wonder why we're worrying about things falling apart even more.

About the attributes for indices both @mokhosh and @SanderMuller mention. In my opinion, model classes shouldn't be concerned with database configuration. I talked about it on stream here: https://www.youtube.com/live/jgN-sT4raiI?si=Y9gRg6NUKuXOLeeb&t=1625 (from 27:00 to 30:00). For me, this is a core design principle within Tempest that makes it stand out: models and database are disconnected (as much as possible). Models are classes, they don't care whether their data comes from a database, a JSON file, an API, … it doesn't matter. You're free to make your case explaining why I'm wrong in my thinking — I'm open for that feedback, but up until this point I haven't read a good argument to step away from that view :)

I watched the video. I agree with your point. But that's more like when you have DTOs or VOs or repositories. It's very common for these to get their data from sources other that the database. But my models get their data from some form of database more than 99% of the time. So, they are the database access layer, and it's natural for them to be somewhat aware of the database.

The big benefit here will be having one source of truth. You have also included validation in models, which could also be argued against, but it's a nice experience to have everything related to data in one place. Again, we are free to add ways of separating these concerns, but the default behaviour can be super convenient this way.

aidan-casey commented 8 months ago

@mokhosh it sounds to me like you use models as a means to perform CRUD actions. That's not a bad way to use them, but it's also not the only way.

On the other hand, I would be more likely to use models as a way to model my domain and enforce business rules. That doesn't map 1-to-1 with requests or tables.

The problem we face today is that Tempest does need to allow for both approaches. Now, we might expand on this in the future, maybe even before v1, but I think I'm leaning toward it'd be easiest and most productive to stay on course with the SQL approach.

bwaidelich commented 8 months ago

Coming from years of ORM-based entities and all the pain that came with it, I strongly believe that separating domain models from their storage mechanism makes sense even if they come from the database 100% of the time.

The big benefit here will be having one source of truth

If you create your domain models in a way that invalid states are impossible (i.e. with validation at the DTO/VO level) that will become your source of truth and IMO a much more reliable one

Personally I came to love hexagonal architecture and go for something like this:

namespace The\Core;

interface BookRepository {
    public function findByAuthorId(AuthorId $authorId): Books;

    public function setup(): SetupResult;
}

and some downstream implementation like

namespace The\Adapters;

final readonly class DbalBookRepository implements BookRepository {

    public function __construct(private Connection $dbal) {}

    public function findByAuthorId(AuthorId $authorId): Books {
        // build SQL
        return self::convertRowsToBooks(...);
    }

    public function setup(): SetupResult {
        // determine schema diff and execute required SQL statements, if any
    }

    private static convertRowsToBooks(array $rows): Books {
        // ...
    }
}

I like this approach because it makes the core models become rock solid and very easy to test and because it leaves all database specifics centralized in a single adapter class.

I tend to use doctrine/dbal (no ORM, no DQL) because it comes with a decent QueryBuilder. So the above implementation of the finder method could look sth like:

$this->dbal->createQueryBuilder()
  ->select('b.*, a.*')
  ->from('books', 'b')
  ->innerJoin('a', 'authors', 'b', 'a.id = b.author_id')
  ->where('a.id = :authorId')
  ->setParameter('authorId', $authorId->value)
  ->execute()
  ->fetchAllAssociative();

It also comes with useful Schema creation and diffing tools. The API is a bit quirky but by specifying the schema like this:

$schema = new Schema();
$tableBooks = $schema->createTable('books');
$tableIncidents->addColumn('id', Types::INTEGER);
$tableIncidents->addColumn('title', Types::STRING, ['length' => self::maxLength(BookTitle::class)]);
// ...

...I can create an idempotent setup method that compares the schema to the current database and creates required SQL statements automatically. No need for migrations at all.

And note, that the constraints (here: length of the title varchar) can be derived from the domain model. Not the other way around.

Having written all this (sorry for the long post) I'm not sure whether this is a relevant approach for a framework and/or how that could be applied. But maybe it sparks some new ideas anyways ;)