Database Discussion - Githubissues

designermonkey commented 11 years ago

With the new project, and the fact we're using a new way of modelling the database (Eloquent), the question has come up about how we structure our tables.

Now, currently, we have a super efficient, super scary abstract structure which without a lot of experience in staring at it trying to denote the sorcery that gave birth to it's arcane structure is very very hard to manage outside of Symphony. With Eloquent, we will be relinquishing some of that abstraction to the ORM, which from a programming and storing point of view is a very good idea.

This doesn't mean though that we need to fully leave our abstract structure behind us, and it's with that in mind that I've been doing some playing and thinking about what would be the easiest, safest and efficient way of utilising our method of working (Sections, Fields, Entries etc) with a less abstract data model.

With Eloquent (if you've not read about it, I suggest you do, it's really good) we would be building a Model for every table in the DB, and would also have our fields do the same.

Currently, we make an entry in the fields table for each Field, and then have the field_fieldtype table for meta data for that Field. Then, we add a data table for every field instance, using an ID for reference in the name. While very abstract, this is going to be a nightmare in the Next project.

Now. I'm thinking that with the new Models, we should continue to have a fields table, but instead of a single Field instance table, and many tables for data, we have a Field schema table and a Field data table. Like so:

tbl_fields
tbl_fields_schema_selectbox_link
tbl_fields_data_selectbox_link

In the schema and data tables, there would be an id column, for uniqueness, and a field_id column to reference the relevant Field, and in, which would allow multiple value fields to store their multiple values in the same table. Also in the data table would obviously be an entry_id. I've added some SQL statements below to illustrate this in practice (although I hear we may not need SQL in Laravel?)...

    CREATE  TABLE IF NOT EXISTS `sym_fields_schema_selectbox_link` (
        `id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT ,
        `field_id` INT(11) UNSIGNED NOT NULL ,
        `allow_multiple` ENUM('yes','no') CHARACTER SET 'utf8' COLLATE 'utf8_unicode_ci' NOT NULL DEFAULT 'no' ,
        `show_association` ENUM('yes','no') CHARACTER SET 'utf8' COLLATE 'utf8_unicode_ci' NOT NULL DEFAULT 'yes' ,
        `related_field_id` INT(11) UNSIGNED NOT NULL ,
        `limit` INT(4) UNSIGNED NOT NULL DEFAULT 20 ,
        PRIMARY KEY (`id`) ,
        INDEX `field_id` (`field_id`)
    )
    ENGINE = MyISAM
    DEFAULT CHARACTER SET = utf8
    COLLATE = utf8_unicode_ci;

    CREATE  TABLE IF NOT EXISTS `symphony_local`.`sym_fields_data_selectbox_link` (
        `data_id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT ,
        `field_id` INT(11) UNSIGNED NOT NULL ,
        `entry_id` INT(11) UNSIGNED NOT NULL ,
        `related_entry_id` INT(11) UNSIGNED NULL DEFAULT NULL ,
        PRIMARY KEY (`data_id`) ,
        KEY `entry_id` (`entry_id` ASC) ,
        KEY `relation_id` (`related_entry_id` ASC) ,
    ENGINE = MyISAM
    DEFAULT CHARACTER SET = utf8
    COLLATE = utf8_unicode_ci;

creativedutchmen commented 11 years ago

@allen I do appreciate the point about data corruption, and I think it should be a very important factor indeed.

I am not so sure I agree with your points about performance. Right now the queries that every DS produces are so complex I can hardly understand why they are so slow sometimes. Rewriting the query from scratch normally fixes these problems, but this is a huge pain as I have to go back and forth between tables to see which tables I have to join.

My point being: because the structure is complex it is hard to write a DS generator that produces efficient queries. For example: the SBL suffers from the dreaded N+1 syndrome, but a fix for this is incredibly hard (I think, otherwise I am sure it would be fixed by now). I am convinced that - even though it is very good - Eloquent will suffer from the same kinds of problems.

designermonkey commented 11 years ago

Can you elaborate the n+1 syndrome? You lost me there.

creativedutchmen commented 11 years ago

@designermonkey sure, it's where you create a query to fetch the children of a parent, then you query each child for its data (1 query to get the children, then N queries to get the data of each child).

This is the reason I rewrite quite a few of my datasources: getting the number of queries down from 150+ for each datasource to one (or a handful, when the query gets too complex).

At least, I suspect the SBL to be the main cause of this, as I noticed that datasources that relied on an sbl field were the worst in this case. It could be that the SBL itself is not the culprit but merely shows the effects better, but still. Does that clarify what I mean?

By the way: before you object saying these queries are small, fast and cached; they might be, but when you introduce a bit of latency between your webserver and your database you'll feel the pain.

designermonkey commented 11 years ago

It certainly does now, thanks.

designermonkey commented 11 years ago

By the way: before you object saying these queries are small, fast and cached; they might be, but when you introduce a bit of latency between your webserver and your database you'll feel the pain.

Me? Object?? ;o)

I think we may find that Eloquent and Fluent will be worse in this respect whatever we do. We may end up having to make commits to those classes to help make them more efficient IMO, but I havent tried yet, so will shut up there.

kmeinke commented 9 years ago

some years old... how its comming? just a comment...

When designing a database with ORM or without - imaging the querys you run against it.

Example: Building the Add / Edit Entrie View. In designermonkey's schema you would need to query: A1) the entries table to get one row - the entry A2) the section table to get one row - the section A3) the fields table to get all fields of that section.

You can join those calls entryID > n:1 > sectionID > 1:n > field row Than you got a your entryID, sectionID and a list with fieldIDs / handels

Than you query: B1) for every field - fields_schema table to get the meta data for each field - get one row B2) for every field - fields_data table to get the data - get one row you cant join those - sql server cant even prepare the querys - because you got dynamic table names.

Basicly - you hit the database all over the place, to get very, very few data out of it. Because you use many dynamic table names - the server cant prepare the querys very well. Worse: you hit basicly the same tables every time a this section is used in the frontend.

ORM let you build schemas on the fly. I would suggest to use field meta information to build one entry tables for every section. O means Object. Symphonys Object should be The Blog Post, The Comment, The Page, The User, The Product, etc.. - and not the abstract thing the system use to managed all that stuff.

As alternative you could use views (if that ORM provide sutch thing) - in that case you would have-.

A view_blogposts joining the tables field_author, field_title, field_text by sectionID & entryID This way you could keep the fields in seperate tables but still have one entries table per section. Extensions could provide theire own field meta data to extend the tables, by extending the view.

This way you could also provide different views on same Objects. Basicly you could model many datasources as sql views.

symphonycms / symphony-next

Database Discussion #12