qcubed / qcubed

The official QCubed framework git repo
https://qcubed.eu/
57 stars 47 forks source link

ORM Discussion #1064

Open spekary opened 8 years ago

spekary commented 8 years ago

This is a place to further discuss a standalone ORM.

Goals

Here are some of my goals, some of which have been articulated by others:

See the #1063 for a proposal on what the namespace and directory structure should be.

The main dependency issues here are the QWatcher and QCache classes. Those should be easily dealt with using configuration settings so that the resulting code optionally does not use these.

Intermediate DB Definition

Currently we allow 2 ways to generate code. One is directly from the database, and one is a kind of hybrid, where the database provides the table structure, but the xml settings file provides the foreign key dependencies. The first method is for inno_db style of SQL, and the second method is for SQL engines that don't have strong foreign key connections.

I propose to separate this. I noticed some Go Lang Orms that use a table definition file, that is then used to drive ORM creation and SQL creation, and it got me thinking that this is part of a good strategy. The thought here is to do the ORM generation in two separate steps:

Advantages include:

Adding some hooks in to the templates for user-level code generation additions should fix this.

Abstraction

This is mostly providing helps in the stub files and documentation in the examples to guide users think of the ORM as data abstraction only. Also, maybe there are improvements or shortcuts to the QQuery language that we can add to make it easier.

For example, in a typical datagrid binder, you might see:

$cond = QQ::OrCondition(QQ::Equal(QQN::Person()->FirstName, $this->txtFirstName->Text), QQ::Equal(QQN::Person()->LastName, $this->txtLastName->Text));
$clauses[] = QQ::Expand(QQN::Person()->Project);
$clauses[] = QQ::OrderBy(QQN::Person()->LastName, QQN::Person()->FirstName);
$clauses[] = QQ::Select(QQN::Person()->LastName, QQN::Person()->FirstName, QQN::Person()->Project->Name);
$dtgPeople->DataSource = Person::QueryArray($cond, $clauses);

We need to instead do something or encourage something like:

dtgPeople->DataSource = Person::GetPeopleWithProjects($this->txtFirstName->Text, $this->txtLastName->Text);

and then have inside of the People class the QQuery code listed above.

Also, the code inside of People class could look more like:

function GetPeopleWithProjects($strFirst, $strLast) {
    $cond = \QQ\Or(QQ\Eq(Person::FirstName, $strFirst), \QQ\Eq(Person::LastName, $strLast));
    $clauses[] = \QQ\Expand(Person::Project); (Person::Project codegen will need to watch for name collisions here)
$clauses[] = \QQ\Sort(Person::LastName, Person::FirstName);
$clauses[] = \QQ\Select(Person::LastName, Person::FirstName, Person::Project::Name);
return Person::QueryArray($cond, $clauses);

Also, in the case of someone using a \QQ\Select clause, we should be setting a bit that records that there is valid data in a field so that if someone tries to read data that was not in a QQ::Select, we throw an exception.

These changes will require changes in the UI code generation, hopefully just in the model connector generators. The problem is that the generators currently encourage injection of custom condition and clauses, so the above abstraction may not be completely doable. Filtering and datagrid sorting really require a strong connection between the UI and the data definition. But at least with some of the changes we can abstract away portions of it.

Thoughts?

spekary commented 8 years ago

PSR Caveats

With respect to PSR-1, coding standard, there is one that really bugs me:

"Class constants MUST be declared in all upper case with underscore separators".

This creates some pretty ugly code. All upper-case implies shouting in prose, and code should be readable. Its also harder to type, and we should always encourage the use of class contants. I suggest we ignore this one and instead use StudlyCaps instead as we are currently doing. Its obvious something is a contstant when it is preceeded by double colon (::) and does not have parenthesis after it.

spekary commented 8 years ago

I realized that:

\QQ\Sort(Person::LastName, Person::FirstName);

Will not work because sometimes nodes and constants are interchangeable and we detect whether we have a node class or a constant value. Plus we will need namespacing. I think the best alternative to

QQN::Person()->LastName

should be:

\Project\Node\Person()->LastName()

with potentially a

use \Project\Node as QQN

so that it can be referred to as

\QQN\Person()->LastName()

I think referring to this using a method call ->LastName() rather than a __get override is important so that the type safety helps of PHP7 and HHVM can be utilized. It also is slightly more perfomant.

vaibhav-kaushal commented 8 years ago

@spekary You could add one more point to that: Sometimes we need to just increment a field in a row. Think view count - e.g. if we are creating a blog post using QCubed as backend and we want to increase the view count for each access, we should not be needing a full Save() action because that would update the entire column and though it had never happened, it might break the row (especially if the developer makes a mistake). We should allow our ORM to call UPDATE over the row in UPDATE posts SET view_count = view_count + 1; style.

Such updates are transactional by default and are quite optimised inside the database (at least inside PostgreSQL's source).

vaibhav-kaushal commented 8 years ago

In addition, I also had in mind: Making QCache an interface to other underlying caching technique.

Thinking of architecture:

We can have all kinds of caching systems under QCache umbrella. So, a developer can declare multiple caches (PHP-Script cache, Memcached, Redis, File-based caching etc.) and use a certain kind of cache according to requirement. So, for example, he/she can use a File-Based cache for images, single object caches using Memcached and Range-Query caches using Redis.

So, as @lackovic10 has proposed in another thread - we can use dependency injection in this case.

Examples of various cases:

Getting PHP object from memory

Something like this could be done (of course, the code can be a lot better/we need to think a little more about this)

$objCacher = QCache::GetMemCacher();
$objUser = $objCacher->getUser(1);
if(!$objUser) {
    $objUser = User::Load(1);
    $objCacher->saveUser(1, $objUser);
}

Getting a file from File cache

$strFilePath = '/path/to/header/image/for/a/blog/post.png';
$objCacher = QCache::GetFileCacher();
$objFileData = $objCacher->getFileByPath($strFilePath);
if(!$objFileData) {
    $objFileData = readfile($strFilePath);
    $objCacher->saveFileData($strFilePath, $objFileData);
}

...and so on!

vaibhav-kaushal commented 8 years ago

I was thinking of handling DB procedures as well.

To just call a DB method and ask it to verify if a user with given email and password combination exists or not is a lot more secure and convenient on the application level, we can do that. But QCubed (and for that matter any ORM right now) does not support that. If done properly, this can be a very advanced case and be helpful to advanced users.

The base work that would go into this can also help us tackle some work with other NoSQL databases (My 5.5th sense says so 😉).

spekary commented 8 years ago

Regarding incrementing, there is math support now. What we need is better update control. There are lots of situations where you need to so something to just one field in a record that we could do pretty easily.

Regarding caching, PSR-6 is a caching interface plan. We should probably move our caching architecture to fit that interface if possible. However, if PSR-6 doesn't fit well with you cache ideas, no need to follow it.

vaibhav-kaushal commented 8 years ago

@spekary Where is the math support? 😕

Also, will go through PSR-6 and will see if we can do something about it.

spekary commented 8 years ago

QQ::Add, QQ::Sub, etc. These are nodes, so you can select on them. But you can't update on them yet. That would take some improvements to QQuery.

olegabr commented 8 years ago

@spekary great plan!

spekary commented 8 years ago

There is also a somewhat difficult bug to fix, #753. This one is important to anyone needing to create a datagrid that has data that is part of an array expansion.

spekary commented 8 years ago

NoSQL Support

A possible strategy for this is to restructure QQuery so that instead of BuildQueryStatement embedded in every ORM object, a query structure is built up, and then passed to each database adapter to unwrap and create the query. All the SQL database adapters could subclass a standard SQL adapter, so that each SQL adapter only needs to implement special cases. A NoSQL adapter can then try to do the query its way.

The Intermediate DB definition I mentioned above will help here. Not only will that allow us to know the structure of each entity, but it will allow us to define foreign keys within a NoSQL database.

A few things to think about:

vaibhav-kaushal commented 8 years ago

@spekary I see what you are trying to do. I would advise against it, or rather, I would advise we take it in two different ways.

NoSQL is very different from SQL. I mean there are limits on Indexes, they do not understand SQL, there is hardly any generalisation (a little among document-based NoSQL solutions - like Dynamo, Mongo and Couch are similar in ways they accept and present data but their bindings, querying etc. are different). That makes it a bad idea to stuff with QQuery. We should keep NoSQL aloof from RDBMS handling altogether. They just won't fit.

About the SQL adapter part - I think that's a Noble idea and I liked it. Had been thinking along those lines (although not as well planned as you, apparently) myself. You can look into Drupal's code for that. They have a nice way of getting that done.

spekary commented 8 years ago

If we can borrow from other ORMs, great.

Re: NoSQL structure, it very much depends on how you use it. Not all NoSQLs are schema-less, and even the ones that are benefit from having some structure in most cases.

In the end, QQuery is not supposed to do everything, but solve some common problems. By building such that a NoSQL database could be used to process most QQuery requests (except maybe for aggregate functions on NoSQL dbs that can't do a map reduce), and also building the templates to always use QQuery, then we make it quite possible to use a NoSQL database as a backend for a QCubed code-generated application, including Google cloud datastore. We have to do that anyway if we want to be more SQL database agnostic, but now we can even support most NoSQL databases. Obviously we have to impose some structure to do that with NoSQL, but its still possible and opens up more options for how to build web apps, including using Google App Engine and Amazon Webservices as hosts.

vaibhav-kaushal commented 8 years ago

@spekary Thanks. But I think we should keep the two types separately. For example: If you think of Redis: It's great as a cache. It can be used to store documents and you can run a pattern search over the keys to get more than one object in a query. You can use it a huge datastore to power something as big as reditt.

DBs like these with no properly defined usecase (because they are so capable at the simple tasks they handle) can cause big headaches when trying to accomodate functionality into QQuery. If you still insist - let's create a chart of which NoSQL DB can do what and maybe we can come to a conclusion.

spekary commented 7 years ago

See: http://sailsjs.org/documentation/concepts/models-and-orm/query-language. Their are a couple of node.js orms that try to be database agnostic. I think doing it this way will be really good for the architecture.

Yes, NoSQL can be completely random in structure, but many of the NoSQL Databases (MongoDB, FireBase to name a few), encourage you to create a set structure so that you can create indexes on the data.Once you have indexes, you easily get to foreign keys, and now you have something that looks a lot like a relational database. The only difference is the lack of aggregation functions, but that can be handled by the adapters, since each NoSQL has a variety of capabiilities regarding aggregation.

It will mean the following:

By doing things this way, you get the following:

vmakre commented 7 years ago

Look at this https://storm.canonical.com/Tutorial main points:

  1. you create database from one class like file (can be xml also, and modify this file not database) look at this tutorial http://propelorm.org/documentation/02-buildtime.html , this system uses XML schema generator from db schema.
  2. You can codegen from this file not from database (better than before if you have incompatible table columns ) , you define everything here and relationships also, no need for naming prefixes .
  3. uses "Store" as object background for storing variables into memory and executing db querries, after all you type store->commit() , or if something wrong store->rollback() .
  4. you can have many objects and with one store->commit() you execute them all.
  5. look at this logic of inserting
    $myobject = new Person(); //table name person 
    $myobject->name = 'new name' ;
    // there you have object like my table person but without primary id number
    store->add($myobject) ; // insert  , but $myobject is in memory and don't have last insert id 
    store->flush() ;  // now we get $myobject from inserted row (like bind method)
    echo $myobject->id ; // we have last insert id and we can use for another relational querry
  6. Some select example
    $oneperson = store->find(Person, Person->name == "Tom Thomas")->one();  // select from Person table , where name="Tom Thomas" limit 1
     echo $oneperson->name ; // result is an object
  7. good for nosql databases . I think qcubed have all this things but this idea looks simplistic and have better logic , because qcubed dynamic querry engine is sometime strange to use and hard to remember.Just need to reorganize mosaic pieces.
lackovic10 commented 7 years ago

@vmakre this looks similar like doctrine i think the qcodo orm is great just needs some love. i personally prefer creating the database and not generating the database from the code. from the book High performance mysql by Baron Schwartz, Peter Zaitsev and Vadim Tkachenko

Beware of Autogenerated Schemas
We’ve covered the most important data type considerations (some with serious and
others with more minor performance implications), but we haven’t yet told you about
the evils of autogenerated schemas.
Badly written schema migration programs and programs that autogenerate schemas
can cause severe performance problems. Some programs use large VARCHAR fields for
everything, or use different data types for columns that will be compared in joins. Be
sure to double-check a schema if it was created for you automatically.
Object-relational mapping (ORM) systems (and the “frameworks” that use them) are
another frequent performance nightmare. Some of these systems let you store any type
of data in any type of backend data store, which usually means they aren’t designed to
use the strengths of any of the data stores. Sometimes they store each property of each
object in a separate row, even using timestamp-based versioning, so there are multiple
versions of each property!
This design may appeal to developers, because it lets them work in an object-oriented
fashion without needing to think about how the data is stored. However, applications
that “hide complexity from developers” usually don’t scale well. We suggest you think
carefully before trading performance for developer productivity, and always test on a
realistically large dataset, so you don’t discover performance problems too late.
vmakre commented 7 years ago

@lackovic10 why you think this?
this is from propel orm:

<?xml version="1.0" encoding="UTF-8"?>
<database name="bookstore" defaultIdMethod="native">
  <table name="book" phpName="Book">
    <column name="id" type="integer" required="true" primaryKey="true" autoIncrement="true"/>
    <column name="title" type="varchar" size="255" required="true" />
    <column name="isbn" type="varchar" size="24" required="true" phpName="ISBN"/>
    <column name="publisher_id" type="integer" required="true"/>
    <column name="author_id" type="integer" required="true"/>
  </table>
  <table name="author" phpName="Author">
    <column name="id" type="integer" required="true" primaryKey="true" autoIncrement="true"/>
    <column name="first_name" type="varchar" size="128" required="true"/>
    <column name="last_name" type="varchar" size="128" required="true"/>
  </table>
  <table name="publisher" phpName="Publisher">
   <column name="id" type="integer" required="true" primaryKey="true" autoIncrement="true" />
   <column name="name" type="varchar" size="128" required="true" />
  </table>
</database>

Nothing new or special here , everything is same as SQL. Xml structure to generate sql for different engines , or nosql databases , why not?. Yes beware of something like this: <column name="name" type="varchar" /> - must have some default values for varchar. I think Qcubed is great system but DbOrm need improvements , new ideas are good if you like to be concurrent with others. If we divide clearly App from DbOrm we will ride on horse. A lot of app code will not depend on database codegen . Then you don't care what type of db resource you are using , sql, nosql or soap.https://en.wikipedia.org/wiki/Multitier_architecture

lackovic10 commented 7 years ago

can you list the benefits with this approach, why is it better? i know domain driven design relies on this approach and some quite experienced programmers are promoting this approach. i personally don't prefer it because i see database as the foundation of the application and i don't want to trust a 3rd party tool to create it for me. i've also always coded that way. also think that the database design should come first when designing a large scale application because one bad thing can take your site down easily

spekary commented 7 years ago

@lackovic10 Good points. I agree. However, there is an approach that can accomodate both. I like Propel's API. But, I don't like being forced to generate from a schema. I found some GO frameworks that will do it in 2 steps - generate a schema from the database, and then generate code from the schema. In addition, you can go backwards, generating the database from the schema. The advantage of the schema is that you can check it in to a source control system. However, if you are creating migration SQL, you would also need to check those in. I find myself having to create migration PHP as well as SQL, so both would need a way of being checked in and related to versions.

Since you can go backwards from the schema, you could potentially use this system to migrate to other databases. As I mentioned earlier, there is an ORM called Sails in node.js. They have an interesting approach of being able to use either SQL or NoSQL databases from the same schema. @vaibhav-kaushal pointed out that NoSQL is not really designed for that kind of use, but I have found a number of NoSQL databases that have found the need to build in relational features, and those get us close enough to be useful for QCubed, since QCubed really isn't doing anything particularly difficult in the SQL that couldn't also be done in NoSQL. This doesn't prevent someone from using a NoSQL database in a more traditional sense as a key-value store, but it opens the possibility of being able to migrate from a SQL to a NoSQL database over the life of a project, if that is what the project needs at the time. It could even be done in parts, splitting up the application into objects that are stored in NoSQL and other objects stored in SQL.

The 2 step process can let you create a PHP object version of the database for doing code generation. Basically, we do that now, but we generate the objects straight from the database rather than from a schema.

A QQuery need currently is to be able to use conditions and clauses to be able to do updates and deletes, not just selects. The problem is that different SQL databases get particularly non-standard when trying to do this. To do this using QQuery, more of the intelligence needs to move to the database adapter so that it can do it in a more database specific way.

Another issue is the current bug in how limits interact with array expansions. To really do it correctly requires 2 queries, one to generate the keys of the objects without the array expansion, and another to query for all the data pointed to by these keys and assemble the arrays.

Finally, how to do some things SQL does well that NoSQL does not. Some things that come to mind are:

So, here is my proposal:

Changes to Codegen

Changes to QQuery Itself

There are a ton of benefits to this, including being able to hand off SQL queries to microservices, use middelware to do logging, use external data stores, perhaps even do it all asynchronously (oooh). We can probably have substantially the same API presented to the application. In other words, the actual model interface does not need to change much.

@lackovic10 I did want your thoughts on the generated models. You had some opinions about avoiding static code, and there is copious use of static code there. I can't think of a way around it, but also, not sure its bad. The static code in the models mostly represent different ways to construct objects and arrays of objects out of the database, and static object constructors are used successfully in all kinds of languages and systems. But if you can suggest another way, lets hear it.

The other issue is the application object. But maybe more specifically is the database adapters. Do you see a better way than doing what we currently do, calling a static GetDatabase function, that then gets the adapter out of globals?

lackovic10 commented 7 years ago

wow this seems like a lot of changes in your plan. not even sure if i can weight in, i don't have experience with no sql databases, also quite short of time at the moment to dive deep into this. one note, when i say schema, i refer to the database schema (sql). the static methods in generated models are definitely not good and cause of a LOT of bad code, at least in my experience. changing this would require a huge effort and getting away from the active record pattern i think. i think that the generated code should be decoupled into

this is just a hint of what i have in my mind. when decoupling code into smaller classes with clearly defined dependencies a dependency injection controller is a must. i still need to get back to our discussion about it. i never had the time so far to work on a solution like this. for the project i'm working on i've added wrapper methods which are used instead of the static ones, to be able to have clearly defined dependencies. here is an example

class UserRepository
{
    public function load($id)
    {
        return User::Load($id);
    }
}

now i inject the UserRepository into class constructors, and use the UserRepository::load() method instead of the User::Load() method. the benefits are clear:

now the queries go into these repository classes, while the other logic from the User class goes to different services, and step by step we expect the user class to have less than a 1000 lines. it had over 10000 if i remember correctly. now this works for us, because we have an internal agreement inside our team that we will create wrappers and not use the static methods, but this is not the way to go for a framework. but as i said it's a very big project. if i ever have time to help with this, i'd love to, but currently it's not possible :(

spekary commented 7 years ago

Don't worry too much about the internals for now. We are all busy. Its the external interface I am mostly concerned about.

Where do you create a UserRepository? What object owns that? At some point you have to have a static method to create an object. When you start your app out, there are no objects. Using the "new" keyword is just a shortcut for calling a static constructor.

For example, this is typical boilerplate code from QCubed:

$intId = QApplication::QueryString('intId');
if ($intId) {
    $objPerson = Person::Load($intId);
} else {
   $objPerson = new Person();
}

Lots of statics here. How would you rewrite that using dependency injection?

lackovic10 commented 7 years ago

the UserRepository is created by the Container and retrieved from the container or injected into services by the Container. so in QForm's and in Controller's (controllers are used for the frontend, the QForms for the admin), i just do $this->container->getService('repository.user')->load($id); for the services that use the UserRepository it's injected inside the container, as container is the only class that worries about managing dependencies here is a small example

<?php

class RegisterManager
{
    private $userRepository;

    public function __construct(UserRepository $userRepository)
    {
        $this->userRepository = $userRepository
    }
}

class Container extends AbstractContainer
{
    /** @var array $services */
    protected $services = array(
        'repository.user' => null,
        'manager.register' => null,
    );

    protected function getRepository_UserService($id, $newInstance)
    {
        if ($newInstance || !$this->services[$id]) {
            $this->services[$id] = new \UserRepository();
        }

        return $this->services[$id];
    }

    protected function getManager_RegisterService($id, $newInstance)
    {
        if ($newInstance || !$this->services[$id]) {
            $this->services[$id] = new \RegisterManager(
                $this->getService('manager.register', array(), $newInstance)
            );
        }

        return $this->services[$id];
    }
}

so again, in a controller or Qform i do

$this->container->getService('manager.register')->register($user);

and the container is the only place that worries about managing dependencies. if a new dependency is added to the RegisterManager, i just add it to the class and manage it inside the container and that's it. very easy to decouple code this way.

lackovic10 commented 7 years ago

as for the code above

<?php

class PersonRepository
{
    public function load($id)
    {
        return Person::Load($id);
    }
}

class PersonManager
{
    private $personRepository;

    public function __construct(PersonRepository $personRepository)
    {
        $this->personRepository = $personRepository;
    }

    public function updatePerson($id)
    {
        $person = $id ? $this->personRepository->load($id) : new Person();
        ...

        return $person;
    }
}

// inside a controller or QForm
$this->container->getService('manager.person')->updatePerson(QApplication::QueryString('intId'));

i'm planning to introduce a library that implements PSR7 - message interfaces and use Request and Response objects

lackovic10 commented 7 years ago

i recommend that you try coding this way, you'll see how easy it is to write smaller classes right away you can use the container i've added in PR 1138 or some of the popular containers out there