silverstripe / silverstripe-framework

Silverstripe Framework, the MVC framework that powers Silverstripe CMS
https://www.silverstripe.org
BSD 3-Clause "New" or "Revised" License
721 stars 821 forks source link

[2012-06-07] Refactor ORM to use prepared statements #1486

Closed silverstripe-issues closed 10 years ago

silverstripe-issues commented 11 years ago

created by: @sminnee (sminnee) assigned to: @tractorcow (tractorcow) created at: 2012-06-07 original ticket: http://open.silverstripe.org/ticket/7429


This is a change that can be made while keeping the outward-facing ORM API the same. In short, the arguments passed to filter() can be turned into prepared statements, rather than manually escaped strings.

tractorcow commented 11 years ago

I would self-assign this to me if I could.

tractorcow commented 11 years ago

Related pull request at https://github.com/silverstripe/sapphire/pull/1360

silverstripe-issues commented 11 years ago

comment by: @sminnee (sminnee) created at: 2012-06-07


Note that named parameters might make this easier and are supported by PDO: http://php.net/manual/en/pdo.prepare.php

It's a toss-up whether the time to port to PDO will make the task shorter overall than just using "?"s.

silverstripe-issues commented 11 years ago

comment by: @nyeholt (marcus) created at: 2012-06-25


I'm going to vote for the extra time for PDO work, as that may provide for better cross db capability.

silverstripe-issues commented 11 years ago

comment by: @sminnee (sminnee) created at: 2012-06-26


The only thing is that PDO is unlikely to be universally applicable. In particular, Microsoft recommend their own plugin for SQL Server access. So then the question becomes "do we use prepared statements in that system?"

silverstripe-issues commented 11 years ago

comment by: @sminnee (sminnee) created at: 2012-09-19


Given this improves the security of SilverStripe it's important.

silverstripe-issues commented 11 years ago

comment by: @halkyon (sharvey) created at: 2012-09-20


sqlsrv drivers from Microsoft do have a PDO adapter now, by the way.

silverstripe-issues commented 11 years ago

comment by: @tractorcow (tractorcow) created at: 2012-09-24


Given that the behaviour of PDO based connections differs in several ways to server specific connector libraries, wouldn't it be more ideal to extend the individual connector classes? E.g. SS_Database > MySQLDatabase > PDOMySQLDatabase. This would give users the ability to specify in their database connection configuration which connector they prefer to use.

We could provide a basic non-pdo wrapper in SS_Database for queries with named parameters, and provide any mysql specific parameter handling in MySQLDatabase (for PDO-less servers). PDOMySQLDatabase could then provide PDO driven implementations of all database functions.

Due to the level of code duplication that would arise from individual implementations of PDO for each database connector, PDODatabase could be implemented to encapsulate this PDO code and use dependency injection to mark this as a dependency for each of the PDOxxxxDatabase connectors.

Existing code would not need to be re-written, and the PDO and non-PDO connectors could be maintained as separate concerns. Maintaining PDODatabase as a single entity across all db connectors will ensure consistency and simplify maintenance.

The second alternative is to rewrite each class to interleave PDO and non-pdo handling code within each function, which I believe is contrary to good principles of OO design (separation of concern and single responsibility). MySQLDatabase should focus on a single mechanism, and PDOMySQLDatabase another.

A third alternative is to use PDODatabase as a base from which connector specific connectors are extended. E.g. SS_Database > PDODatabase > PDOMySQLDatabase. Any MySQL specific code would have to be re-written and couldn't share any code with MySQLDatabase. It probably is not appropriate to inject a MySQLDatabase instance here, as the creation of additional database connections mixing PDO / regular mysql is probably going to be a mess. This implementation could work if the cost of maintaining two sets of MySQL specific functions is acceptable.

I personally prefer the first/third approach, but it seems that others chatting here are looking at the second. Opinions?

I hope I've accurately captured the options here, so at least we can agree on a direction moving forward.

silverstripe-issues commented 11 years ago

comment by: @tractorcow (tractorcow) created at: 2012-10-16


I've created an extremely basic implementation of PDOMySQLDatabase at https://github.com/tractorcow/sapphire/tree/3.0-pdo-database.

At the moment this does nothing more than provide a working alternative to MySQLDatabase using the PDO adaptor, but doesn't take advantage of prepared statements at all.

My next step is to implement the above features using this as a base. Before I accept this ticket could someone please review my work so far and give me some feedback?

silverstripe-issues commented 11 years ago

comment by: @tractorcow (tractorcow) created at: 2012-10-18


I'm going to work through another possible implementation based on some of the feedback I've received from Sam. I'm still keen on further feedback from anyone else in the mean time as I build new concepts. :)

See https://github.com/tractorcow/sapphire/commit/d9a4a628f1e8f6d348c534eca4d9f114ea7d0199 for some discussion on the current build.

silverstripe-issues commented 11 years ago

comment by: @tractorcow (tractorcow) created at: 2012-11-06


I have still in mind a few implementation ideas for this ticket and am working slowly through them. However, my time at the moment has unfortunately become a little sparse to devote to Silverstripe contribution. If someone is available and willing to reassign this ticket to themselves then by all means do so, otherwise I'll return to this at my next available moment. I will likely be busy for the next week or two at least.

I apologise for neglecting this item, but by no means do I wish to see it stagnate. Thanks!

silverstripe-issues commented 11 years ago

comment by: @chillu (ischommer) created at: 2012-12-12


Sam, I know it (probably) won't change public facing APIs, but that sounds like quite a big change to introduce post beta. Should we unassign the milestone?

silverstripe-issues commented 11 years ago

comment by: @tractorcow (tractorcow) created at: 2013-01-30


Now that I have time again to work on this I have been working on another implementation of a database structure taking in mind the above feedback.

I have created separate classes for handing each database implementation:

Initiation of the database is handled via injector with various configurations available to user code via the below yaml file. By default requesting a MySQLDatabase will return one with a PDO Connector, unless MySQLiDatabase is requested.

---
name: databaseconnectors
---
Injector:
  MySQLDatabase:
    class: 'MySQLDatabase'
    properties:
      connector: %$PDOConnector
      schemaManager: %$MySQLSchema
  MySQLiDatabase:
    class: 'MySQLDatabase'
    properties:
      connector: %$MySQLiConnector
      schemaManager: %$MySQLSchema

The DB::connect function looks like the below, with separate steps for creation and initialisation.


// Using Injector->get allows us to use registered configurations
// which may or may not map to explicit objects
$conn = Injector::inst()->get($dbClass);
$conn->connect($databaseConfig);
self::setConn($conn);

Also included is a huge amount of refactoring in database into separate files, as well as cleanup of existing code. There seems to be a huge amount of redundant and poorly defined code, which I've attempted to cut down through refactor, deletion, and either addition or changing of PHPdoc to better define each method.

I'm still in the process of testing and squashing my commits.

The actual implementation of prepared statements is yet to come, once I have finished this refactor and verified that all pass test cases.

silverstripe-issues commented 11 years ago

comment by: @tractorcow (tractorcow) created at: 2013-01-31


I've run a full test suite and pushed my working tree up to https://github.com/tractorcow/sapphire/tree/3.1-pdo-connector

Next step is to begin working on parameterised queries on the DBConnector classes

silverstripe-issues commented 11 years ago

comment by: @tractorcow (tractorcow) created at: 2013-01-31


I think that instead of using ordered parameters we should probably use named parameters. It would make parameter references a bit more meaningful. SQLQuery could enforce uniqueness of parameter names by throwing errors when duplicate parameter keys are encountered.

Anyone have any thoughts on the matter?

silverstripe-issues commented 11 years ago

comment by: @tractorcow (tractorcow) created at: 2013-01-31


It looks like sqlsrv_query supports parameterised queries natively, but only ordered parameters (using ?, as does PDO), not named parameters. Perhaps best to cater to the lowest common denominator. :)

silverstripe-issues commented 11 years ago

comment by: @tractorcow (tractorcow) created at: 2013-01-31


If it's ok, I'm going to reassign this back to myself.

Timeline for the rest of this ticket:

silverstripe-issues commented 11 years ago

comment by: @chillu (ischommer) created at: 2013-01-31


Dear sir, you rock! :) Too busy with some security-related stuff to dive into the topic, but generally thrilled that we're making progress.

silverstripe-issues commented 11 years ago

comment by: @chillu (ischommer) created at: 2013-01-31


Oh, maybe a note on the estimate field: I've added this to all 3.1 ticket to get a feel for how far we're off. Its "hours remaining" rather than "total effort", and particularly for this ticket a pure gut feel. So feel free to revise the estimate.

silverstripe-issues commented 11 years ago

comment by: @tractorcow (tractorcow) created at: 2013-02-01


Thanks for the kind encouragement. :)

I've already implemented preparedStatement for both the MySQLiConnector and PDOConnector, and since PostgreSQL, MS SQL and SQL Lite each seem to support prepared statements I might leave preparedQuery as an abstract method.

How does this look as a potential API for SQLQuery?

    /**
     * Set a WHERE clause.
     * 
     * Note that the database will execute any parameterised queries using
     * prepared statements whenever available.
     *
     * There are several different ways of doing this.
     *
     * <code>
     *  // the entire predicate as a single string
     *  $query->setWhere("Column = 'Value'");
     *
     *  // multiple predicates as an array
     *  $query->setWhere(array("Column = 'Value'", "Column != 'Value'"));
     * 
     *  // multiple predicates with parameters
     *  $query->setWhere(array("Column = ?" => $column, "Name = ?" => $value)));
     * 
     *  // Shorthand for the above
     *  $query->setWhere(array("Column" => $column, "Name" => $value)));
     * 
     *  // Multiple predicates, each with multiple parameters.
     *  $query->setWhere(array(
     *      "ColumnOne = ? OR ColumnTwo != ?" => array(1, 4),
     *      "ID != ?" => $value
     *  ));
     * </code>
     * 
     * Note that if giving multiple parameters for a single predicate the array
     * of values must be given as an indexed array, not an associative array.
     * 
     * Also should be noted is that any null values for parameters may give unexpected
     * behaviour. array('Column' => NULL) is shorthand for array('Column = ?', NULL), and
     * will not match null values for that column, as 'Column IS NULL' is the correct syntax.
     * 
     * If it's necessary to force the parameter to be considered as a specific data type
     * by the database connector's prepared query processor any parameter can be cast
     * to that type by using the following format.
     * 
     * <code>
     *  // Treat this value as a double type, regardless of its type within PHP
     *  $query->setWhere(array(
     *      'Column' => array(
     *          'value' => $variable,
     *          'type' => 'double'
     *      )
     *  ));
     * </code>
     *
     * @param string|array $where Predicate(s) to set, as escaped SQL statements.
     * @return SQLQuery
     */
    public function setWhere($where) {
        $this->where = array();
        return $this->addWhere($where);
    }
silverstripe-issues commented 11 years ago

comment by: @tractorcow (tractorcow) created at: 2013-02-01


When passing predicates and parameters as a pair to various filter functions it doesn't simply work to have one array of strings and one array of parameters; Sometimes a single condition may be a string that contains multiple parameters, so it's not always a 1-to-1 mapping, and there are various areas around the code where individual conditions need to be added/updated/removed, and hence need to be identifiable alongside the relevant parameters. E.g. DataQuery::removeFilterOn

I hope that the above solution is ideal for those using the code.

I'm currently working my way through the various search filters and replacing the escaped queries with paramaterised queries.

I also found a few issues implementing DataQuery_SubGroup with the new api. I don't think the toString mechanism in place is going to work anymore.

silverstripe-issues commented 11 years ago

comment by: @tractorcow (tractorcow) created at: 2013-02-04


I just realised that the way I've implemented this is going to cause key conflicts if multiple queries use the same predicate.

$query->addWhere(array('ID != ?' => 4));
$query->addWhere(array('ID != ?' => 5));

The query would only save one of the above since they have duplicate keys.

Easily solved by nested each query in its own array, but it seems more complicated.

silverstripe-issues commented 11 years ago

comment by: @tractorcow (tractorcow) created at: 2013-02-07


I've pushed up an in-progress version of the prepared statement implementation up to the following branches:

https://github.com/tractorcow/sapphire/tree/3.1-pdo-connector-working

https://github.com/tractorcow/silverstripe-cms/tree/3.1-pdo-connector-working

There seems to be a huge amount of work involved here, but I feel that I've broken the back of it so far.

silverstripe-issues commented 11 years ago

comment by: @tractorcow (tractorcow) created at: 2013-02-07


Updated API for addWhere as below:

    /**
     * Adds a WHERE clause.
     * 
     * Note that the database will execute any parameterised queries using
     * prepared statements whenever available.
     *
     * There are several different ways of doing this.
     *
     * <code>
     *  // the entire predicate as a single string
     *  $query->addWhere("Column = 'Value'");
     *
     *  // multiple predicates as an array
     *  $query->addWhere(array("Column = 'Value'", "Column != 'Value'"));
     * 
     *  // Shorthand for the above using argument expansion
     *  $query->addWhere("Column = 'Value'", "Column != 'Value'");
     * 
     *  // multiple predicates with parameters
     *  $query->addWhere(array("Column = ?" => $column, "Name = ?" => $value)));
     * 
     *  // Shorthand for simple column comparison (as above), omitting the '?'
     *  $query->addWhere(array("Column" => $column, "Name" => $value)));
     * 
     *  // Multiple predicates, each with multiple parameters.
     *  $query->addWhere(array(
     *      "ColumnOne = ? OR ColumnTwo != ?" => array(1, 4),
     *      "ID != ?" => $value
     *  ));
     * </code>
     * 
     * Note that if giving multiple parameters for a single predicate the array
     * of values must be given as an indexed array, not an associative array.
     * 
     * Also should be noted is that any null values for parameters may give unexpected
     * behaviour. array('Column' => NULL) is shorthand for array('Column = ?', NULL), and
     * will not match null values for that column, as 'Column IS NULL' is the correct syntax.
     * 
     * Additionally, be careful of key conflicts. Adding two predicates with the same
     * condition but different parameters can cause a key conflict if added in the same array.
     * This can be solved by wrapping each individual condition in an array. E.g.
     * 
     * <code>
     * // Multiple predicates with duplicate conditions
     *  $query->addWhere(array(
     *      array('ID != ?' => 5),
     *      array('ID != ?' => 6)
     *  ));
     * 
     * // Alternatively this can be added in two separate calls to addWhere
     * $query->addWhere(array('ID != ?' => 5));
     * $query->addWhere(array('ID != ?' => 6));
     * 
     * // Or simply omit the outer array
     * $query->addWhere(array('ID != ?' => 5), array('ID != ?' => 6));
     * </code>
     * 
     * If it's necessary to force the parameter to be considered as a specific data type
     * by the database connector's prepared query processor any parameter can be cast
     * to that type by using the following format.
     * 
     * <code>
     *  // Treat this value as a double type, regardless of its type within PHP
     *  $query->addWhere(array(
     *      'Column' => array(
     *          'value' => $variable,
     *          'type' => 'double'
     *      )
     *  ));
     * </code>
     *
     * @param mixed $where Predicate(s) to set, as escaped SQL statements or paramaterised queries
     * @param mixed $where,... Unlimited additional predicates
     * @return SQLQuery
     */
    public function addWhere($where) {
        $where = $this->normalisePredicates(func_get_args());

        // If the function is called with an array of items
        $this->where = array_merge($this->where, $where);

        return $this;
    }
silverstripe-issues commented 11 years ago

comment by: @vanlawrence (clawrence) created at: 2013-02-07


Legendary effort tractorcow, we are totally stoked to have you working on this. You coming to the meetup?

silverstripe-issues commented 11 years ago

comment by: @tractorcow (tractorcow) created at: 2013-02-08


I sure hope so, will confirm later. :)

Have comitted and pushed my changes to the working branch before the weekend. Seems to work at a casual test run, but need to do massive testing.

I'm just about done converting all the queries to use the prepared statement model. There are a few SQL strings around the place I haven't spotted.

...not a single test case run yet. >_>

https://github.com/tractorcow/sapphire/compare/silverstripe:3.1...tractorcow:3.1-pdo-connector-working

https://github.com/tractorcow/silverstripe-cms/compare/silverstripe:3.1...tractorcow:3.1-pdo-connector-working

I have no idea how I'm going to eventually send this pull request. I might need to enlist the help of some dedicated core testers to help, once we're ready to begin beta testing that is. :)

I haven't yet got onto the Postgres / Mysql / MS Sql updates yet, but I don't think they'll be too difficult once we get the core nice and robust.

I'm accepting style and methodology critique at the moment, but bear in mind that the code doesn't work.

silverstripe-issues commented 11 years ago

comment by: @tractorcow (tractorcow) created at: 2013-02-12


I have some further test case runs and various fixes.

There is a huge amount of work involved in migrating all the hard coded SQL strings into prepared queries. I've knocked off most of the DataObject::get instances, and am working through the direct DB::query calls at the moment. It's quite exhausting to say the least...

Does anyone have a good recommendation for how to tackle database manipulations? I haven't quite figured out a good pattern for migrating these to prepared statements without breaking existing code. It seems to be quite a complicated part of the database code.

I also need to schedule some time to look at Postgres, then maybe ms sql and sql lite after that.

silverstripe-issues commented 11 years ago

comment by: @tractorcow (tractorcow) created at: 2013-02-12


After spending some time looking into DB::manipulate, I'm wondering if the premise for how data is updated, and the API in this area needs to change, as well as the way that DBField prepares values for inserting into the database.

Rather than have the fields themselves take care of the value they wish to enter, they should really be preparing a part of an insert/update statement including a piece of sql and an optional parameter. This is what manipulate should expect, forcing user code to adhere to a paramaterised insert/update convention rather than manually escaping all values.

I'll have a play around with this and see what I can come up with.

silverstripe-issues commented 11 years ago

comment by: @tractorcow (tractorcow) created at: 2013-02-12


Maybe it's better to look at implementing the SQL builder class(es) that were mentioned earlier? Currently SQLQuery supports select and delete, but insert and update are both performed via manipulations. Perhaps these should be refactored into their own classes, e.g.

silverstripe-issues commented 11 years ago

comment by: @tractorcow (tractorcow) created at: 2013-02-13


Some of the previous comments (throughout the code, as well as this thread) seem to make a lot more sense to me as I investigate the codebase. There really is a need for a database-overridable query generation object.

I've refactored out the sql*ToString functions from the SS_Database class into a DBQueryBuilder class, and have started working on a more complete SQL generation methodology.

Using the above classes, we should at least be able to better and more consistently execute each of the different types of queries, while maintaining database specific SQL code.

Hopefully we can phase out the DB::manipulate code once the update/insert queries are working properly.

The dependencies for the database is now starting to look like the below:

---
name: databaseconnectors
---
Injector:
  MySQLDatabase:
    class: 'MySQLDatabase'
    properties:
      connector: %$PDOConnector
      schemaManager: %$MySQLSchema
      queryBuilder: %$DBQueryBuilder
  MySQLiDatabase:
    class: 'MySQLDatabase'
    properties:
      connector: %$MySQLiConnector
      schemaManager: %$MySQLSchema
      queryBuilder: %$DBQueryBuilder
silverstripe-issues commented 11 years ago

comment by: @tractorcow (tractorcow) created at: 2013-02-15


I've done additional refactoring around the query interface and have added/refactored the following:

There is a bit of a class hierarchy here, as delete, select, and update all have 'where' conditions and tables to select from, so these share a common parent class SQLConditionalExpression

SQLConditionalExpression and SQLInsert both inherit from SQLExpression

DBQueryBuilder Handles all of the query generation

DB::manipulate has been redeveloped to use the above instead of raw queries.

The code still doesn't work... I seem to have broken a few things, as I do. :)

I admittedly have done some code cleanup, which perhaps I should have avoided.

The links above for the CMS and Framework are where I've been putting my "workspace", but they'll be pushed to a cleaned up branch once I've written test cases, tested, and bugfixed.

Hopefully I'm at the end of the API breaking stage and can get back to tidying up and testing.

silverstripe-issues commented 11 years ago

comment by: @tractorcow (tractorcow) created at: 2013-02-18


I have done quite a few major bugfixing and have pushed up a rebased/squashed and reasonably bug free branch at the below:

https://github.com/tractorcow/sapphire/tree/3.1-pdo-connector

https://github.com/tractorcow/silverstripe-cms/tree/3.1-pdo-connector

And here are the links for comparison against the current 3.1 branches on silverstripe

https://github.com/tractorcow/sapphire/compare/silverstripe:3.1...tractorcow:3.1-pdo-connector

https://github.com/tractorcow/silverstripe-cms/compare/silverstripe:3.1...tractorcow:3.1-pdo-connector

Yes, I have indeed changed more than 12,000 lines. :) I think that might be inflated slightly due to the fact that I moved the database code to a new directory prior to the refactor (Git treats that as a delete and addition, not a move).

There are still a few test cases that aren't running properly, some that need to have some tests rewritten to consider the new database methodology (such as DBFieldTest), and there are many tests failing due to string comparison issues with SQL.

One of my issues is that I've taken it upon myself to write the DBQueryGenerator class in such a way that it semi-formats the resulting SQL query with indentation and newlines. This also means that test cases that were written with the old SQL generator code no longer pass, even if the SQL is correct in many cases (although there will be still cases where it's not correct).

Here is my question; Should I revert to unformatted SQL generation (no newlines, indentation, etc), should I rewrite the test cases to use the new format, or thirdly, should I write a SQL text comparison class that compares SQL strings accounting for things such as parameters, formatting, case difference, newlines, indentation, and the like?

I'm keen on your feedback. Check out /framework/model/connect for the database connection layer code, and /framework/model/queries for the query representation code.

Once the test cases are all fully passing, then I'll re-direct my attention to attacking the Postgres MSSql and SQLLite modules.

Speaking of which, is there anyone willing to assist me with testing and bugfixing?

silverstripe-issues commented 11 years ago

comment by: @sminnee (sminnee) created at: 2013-02-18


Awesome work, I'm reviewing this now. :-)

silverstripe-issues commented 11 years ago

comment by: @tractorcow (tractorcow) created at: 2013-02-18


Thanks Sam, we're discussing this on the google groups at https://groups.google.com/forum/?fromgroups=#!topic/silverstripe-dev/jLk3bgeCebI

I'm going to write some additional configuration around formatting of SQL queries, and add a testing extension for comparing parameterised SQL queries. Hopefully then I'll be able to update the failing test cases.

silverstripe-issues commented 11 years ago

comment by: @tractorcow (tractorcow) created at: 2013-03-12


This has been rebased against the 3.2 branch.

Below are the various modules I've developed in preparation for this deployment:

https://github.com/tractorcow/sapphire/tree/3.2-pdo-connector https://github.com/tractorcow/silverstripe-cms/tree/3.2-pdo-connector https://github.com/tractorcow/silverstripe-postgresql/tree/3.2-pdo-connector https://github.com/tractorcow/silverstripe-sqlite3/tree/3.2-pdo-connector

MS SQL and Installer modules coming up

silverstripe-issues commented 11 years ago

comment by: @tractorcow (tractorcow) created at: 2013-03-28


MS SQL driver updated at https://github.com/tractorcow/silverstripe-mssql/tree/3.2-pdo-connector

Note that mssql has been dropped, as it doesn't support parameterised queries (outside of stored procedures).

silverstripe-issues commented 11 years ago

comment by: @tractorcow (tractorcow) created at: 2013-04-03


Pull request for this here https://github.com/silverstripe/sapphire/pull/1360

simonwelsh commented 10 years ago

This has been done with framework, mssql, postgres and sqlite. Only CMS is remaining, and that's fairly close. Closing this ticket now.