grinnellplans / grinnellplans-php

Automatically exported from code.google.com/p/grinnellplans
Other
7 stars 7 forks source link

Remove raw SQL queries from functions*.php #241

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Purpose of code changes on this branch:

Access to data should probably be abstracted into methods on model objects, 
rather than raw queries hardcoded in various functions. This helps to separate 
the details of data storage (SQL schema, etc.) from the details of its 
presentation (Widget lists, etc.) Most of the major work has been done, but 
there remain a lot of mundane details to be ported out of the functions*.php 
grab bag.

The two patches attached are a modest start. They reimplement two functions, 
setUpdatedTime and get_autoread, using the ORM machinery.

Notes (esp. re: the second patch, which is more complicated):

 - It's not clear to me whether model objects belong in /inc, /db/models, or a combination of both. At present, /inc is used for many important data-retrieval tasks (e.g., authentication). On the other hand, /db/models is used for the most important data-storing tasks (storing the plan and converting to HTML). Stuff in /inc seemed a little bit higher-level, so I decided to attach my new getAutofinger method to the 'User' class there.

 - Everything on the 'User' class is a static method, even for things which seem like they should be instance methods. getAutofinger should really be an instance method, I think, but I didn't see a constructor anywhere, so I just implemented it as a static method for now. It works, just not very OO.

 - the 'Autofinger' model in /db/models seems to mis-describe the table I have in my development database - it claims that the 'autofinger' table has a single primary key called 'id' - when the database has no such column, but instead a composite primary key between 'owner' and 'interest'. I changed this so that the ORM can work with autofinger lists. If the production database does have an 'id' on that table after all, we should not need to commit those changes.

 - the new code in populate_page makes only one database query to get the autoreads, instead of three. I'm not sure if that's faster or slower, but it made more sense to me. If there's somewhere else in the code where we only want one level at a time, it would be easy to modify the getAutoread method to accept that as an argument.

tk

Original issue reported on code.google.com by umhecbaa on 23 Jul 2012 at 4:56

Attachments:

GoogleCodeExporter commented 9 years ago
Clarification on note #3: By "I changed this" I mean that I changed the 
Doctrine model, not the database. I made absolutely no changes to the database 
schema.

Original comment by umhecbaa on 23 Jul 2012 at 5:14

GoogleCodeExporter commented 9 years ago
The setUpdatedTime patch LGTM. I'll commit in 24 hours, as usual.

I'm not entirely sure User.php is the right place to put getAutofinger. The 
stuff in there now is mostly account-related - passwords, email addresses, etc. 
- and it doesn't seem to fit to me. But that's just a gut feeling - if you feel 
that's the best place, I won't be hard to convince. Apart from that, LGTM.

We probably should make more effort to define what goes where - the present 
code puts everything everywhere. Part of the problem is that we've got some 
things like the User class sitting on top of the Accounts table - leaky 
abstractions galore!

I agree completely re. User class - ideally, almost none of the methods in 
there should be static, and either the User::get() or the User::login() method 
should be the constructor. 

There's no "id" column in the autofinger table on production. Good catch. BTW, 
the database dump at /trunk/documents/db-schema accurately reflects the 
production database; I've maintained it when I've made schema changes recently.

re: populate_page(): In general, I think fewer queries are better. I'd actually 
caught that a couple weeks ago when people were talking about whether Plans 
Plus's autoread-refreshing feature would adversely affect the server, but 
hadn't had time to patch it.

Thanks for your hard work!
AC

Original comment by a...@alexcohn.com on 23 Jul 2012 at 5:26

GoogleCodeExporter commented 9 years ago
Awesome! Using the ORM more is great, and as you discovered there are some 
pieces that still need some TLC.

I'd vote for putting almost everything in db/models. inc/ seems to be mostly 
for configuration and session stuff (Ian A may comment more on the exact 
purpose). Plus, as you noticed, the User class is a little weird - for now 
let's limit it to authentication stuff, and eventually we can probably replace 
it.

Side note: the most common way I've seen to handle users is to keep two 
separate models: User, backed by the db table, and UserSession, which 
represents the authentication session. You log a user in by creating a new 
UserSession, and log them out by destroying it.

Original comment by ian.gree...@gmail.com on 23 Jul 2012 at 4:42

GoogleCodeExporter commented 9 years ago
Ok, I thought about it and I now agree completely. User is the wrong place.

Ian's distinction between a model for db data and a model for cookie data makes 
a lot of sense to me. That makes the Account/User scheme seem a lot less 
schizophrenic - Account is for db data, User is for cookie data.

It also clarifies why the User class manages to be functional with nothing but 
static methods. Any instance variables it might have had are living in 
$_SESSION, where they can persist just as well. In fact, they persist there 
longer than the User instance would last, since the cookie persists over many 
pageviews.

I still think Alex has a good point, User is weird that way, maybe we should 
work on it. But neither User::login() or User::get() strikes me as a very good 
constructor for User. As implemented now, these methods are factory functions 
for Account.

Anyway, I reimplemented the getAutoread patch as a /db/models method, but that 
comes at a cost: in order to call it, code (such as populate_page) needs a 
reference to the user Account. So I needed to add another call to User::get().

User::get() has, in my opinion, big problems. All it needs to do is return a 
reference to the user Account, but it does this by running a query. Lacking any 
knowledge of what data is eventually needed, it takes a kitchen sink approach:

SELECT a.userid AS a__userid, a.username AS a__username, a.created AS 
a__created, a.password AS a__password, a.email AS a__email, a.pseudo AS 
a__pseudo, a.login AS a__login, a.changed AS a__changed, a.poll AS a__poll, 
a.group_bit AS a__group_bit, a.spec_message AS a__spec_message, a.grad_year AS 
a__grad_year, a.edit_cols AS a__edit_cols, a.edit_rows AS a__edit_rows, 
a.webview AS a__webview, a.notes_asc AS a__notes_asc, a.user_type AS 
a__user_type, a.show_images AS a__show_images, a.guest_password AS 
a__guest_password, a.is_admin AS a__is_admin FROM accounts a WHERE a.userid = 
'2523'

The DB gets hit with this every time we call User::get(), which is a lot. It 
happens twice in a row whenever anybody logs in. And I just added one more. 
Once we get the code straightened out, we might be able to get this down to 
once every page load, but it's still sloppy.

It's an exceptional shame in this case, since all we ever need to get the 
autoreads is a user id, which we have from the cookie in the first place. We 
request all this data and literally use none of it - it's just so that we can 
have getAutofinger attached to the Account, and not the User. I'm not sure how 
to optimize things without SELECTing for only the necessary data when getting 
the reference, but that breaks the whole object abstraction.

I hate to add a nasty query to the code, but maybe it is the best compromise 
for now?

Original comment by umhecbaa on 30 Jul 2012 at 8:39

Attachments:

GoogleCodeExporter commented 9 years ago
One small to your setUpdatedTime patch: why not go all-in on ORM and use 
Doctrine_Expression('NOW()') instead of the timestamp function in functions.php?

On the subject of the gigantic query: Ideally, at the start of a page load, we 
would load all needed current-user-related data (i.e. username, stylesheet, 
view preferences, etc) with one big query. In the future, it would be great to 
replace the $idcookie global with a User (or maybe UserSession?) global, which 
gets loaded once per pageview. Additionally, if there are columns we know we 
never use (e.g. accounts.poll and accounts.spec_message, which IIRC are used 
nowhere), we could add a ->select() statement to the Doctrine query to not 
include them.

I'd like to point out, though, that get() is finding an Account by primary key, 
it's loading at most one row, it's a fixed-size row, and it's at most a 
kilobyte or two. While they're not pretty, these are fast queries.

Original comment by a...@alexcohn.com on 30 Jul 2012 at 9:05

Attachments:

GoogleCodeExporter commented 9 years ago
Also, Doctrine supports non-equal nest relationships: 
http://docs.doctrine-project.org/projects/doctrine1/en/latest/en/manual/defining
-models.html#self-referencing-nest-relations

I'm not sure if this would be a better fit than the solution you took - I don't 
know if it'll let you access autofinger's columns (e.g. an autofinger entry's 
priority or read state) rather than of either of the two Accounts - but just 
wanted to make sure you'd seen it. 

Original comment by a...@alexcohn.com on 30 Jul 2012 at 9:17

GoogleCodeExporter commented 9 years ago
My first crack at things did use autofinger as a join table for a many-to-many 
Doctrine relationship, but you're right, I couldn't figure out how to get at 
the extra columns. A couple web searches seemed to imply that this requires 
treating the join table as an entity, so I went that route.

One thing about the getAutofinger() method is that it doesn't use this relation 
going forwards ($this->Interests) - it constructs a new query on the autofinger 
table, using a WHERE clause on the owner. Maybe this isn't the ORM ideal but it 
seems to make efficient queries. When I tried it the other way, I recall 
something going funny with joining the (single) Account to the Autofinger on 
owner, and then joining that to the Account again on the interest side... So I 
cut out one level of joining by doing the WHERE. 

Cool, I didn't know about NOW(), that is better.

Good to know that the ugly query isn't slow. I bet that is helped a lot by 
having put the plan text in another table... Still, looking forward to the day 
when we can call it once and then pass that reference around instead of `$dbh`s 
and `$idcookie`s.

Original comment by umhecbaa on 30 Jul 2012 at 3:48

GoogleCodeExporter commented 9 years ago
I've moved two more queries down to the model level. Notes on these:

- These should be applied after two of the patches discussed here in July: 
setUpdatedTime2.patch and getautoread2.patch. Starting now, I'll name patches 
sequentially.

- I implemented these queries as methods of Account. They both have to do with 
display options, however, so they might be better suited as methods of Display. 
I hesitate to put them there, though, because I don't really understand why 
Display is a separate model from Account in the first place.

Original comment by umhecbaa on 23 Oct 2012 at 2:50

Attachments:

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r619.

Commit patch setUpdatedTime2.patch from 
http://code.google.com/p/grinnellplans/issues/detail?id=241#c5

Original comment by a...@alexcohn.com on 25 Oct 2012 at 6:28

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r620.

Commit patch getautoread2.patch from 
http://code.google.com/p/grinnellplans/issues/detail?id=241#c4

Original comment by a...@alexcohn.com on 25 Oct 2012 at 6:50

GoogleCodeExporter commented 9 years ago
Implementing getStylesheet and getInterface as methods of Account makes sense 
to me. 

Quick question re. getStylesheet.patch: why not add a 
$this->hasAccessor('stylesheet', 'getStylesheet');
line to db/models/Accounts.php? It would allow functions-display.php's 
populate_page() to do  "$user->stylesheet" instead of 
"$user->getStylesheet();", which I think is a bit neater. The same change could 
probably be made to getInterface, I think?

This is a wild guess, but I think Display is separate from Account because when 
the base models were generated from the database schema, Account and Display 
were separate tables. There's a lot of stuff that probably could be in the 
Accounts table that isn't, e.g. read-only status, which is stored in the 
two-column "perms" table.

Sorry about the delay in committing your first two patches - I thought I'd done 
it months ago!

Original comment by a...@alexcohn.com on 25 Oct 2012 at 7:30

GoogleCodeExporter commented 9 years ago
Good idea - I prefer the accessor method too. Here's a patch that'll add it.

Original comment by umhecbaa on 26 Oct 2012 at 9:05

Attachments:

GoogleCodeExporter commented 9 years ago
It looks like the timestamp() function is no longer used, and mysql_timestamp() 
is in only one place. This patch eliminates them.

I grep'd around and am pretty sure that they're nowhere else, but please 
double-check.

Original comment by umhecbaa on 26 Oct 2012 at 10:55

Attachments:

GoogleCodeExporter commented 9 years ago
One additional place we get timestamps on the web server: processing [date] and 
[dnew]. I can't think of an obvious way to fix that one, apart from a 'SELECT 
NOW();' query. 

Original comment by a...@alexcohn.com on 27 Oct 2012 at 1:05

GoogleCodeExporter commented 9 years ago
Good point. If web and db servers are kept within a few seconds, it probably 
isn't worth another query - but only because this one is written out to 
plaintext purely for human reference.

Ideally though, you're right, our application should share a clock. I have no 
idea how this type of thing is usually handled in the web dev world - anyone 
know?

Original comment by umhecbaa on 29 Oct 2012 at 4:49

GoogleCodeExporter commented 9 years ago
FWIW we nominally make sure our servers are within a minute or so of each other 
on every deploy and (try) to keep them in sync via NTP.  Nothing we've got is 
so sensitive that it must be closer than that.

If you really want a shared clock for the [date] replacements you may be able 
to work up something with MySQL's concat() and NOW() functions on the update 
statement, but that seems like a lot of work to avoid one (easy) sql statement 
to get a (slightly) more accurate timestamp.

Original comment by mabo...@gmail.com on 29 Oct 2012 at 1:29

GoogleCodeExporter commented 9 years ago
AFAIK this is usually handled in the web dev world by doing all time 
calculations application-side. Not that there's anything wrong with this 
solution for the current app - the [date] stamps in particular do not need to 
be highly accurate.

Original comment by ian.gree...@gmail.com on 29 Oct 2012 at 4:24

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r626.

Commit patch 3_getStylesheet.patch from 
https://code.google.com/p/grinnellplans/issues/detail?id=241#c8

Original comment by a...@alexcohn.com on 14 Jul 2013 at 4:18

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r627.

Commit balance of patches from Issue 241. Also, small patch: use 
Doctrine-loaded is_admin value instead of loading it from the database.

Original comment by a...@alexcohn.com on 14 Jul 2013 at 6:59

GoogleCodeExporter commented 9 years ago
Thanks for the commits back in June. I wrote three more changes that bring us 
closer.

#7: This patch adds a method on Account which counts a user's unread secrets. 
Doctrine was actually quite nice here in the case of someone who has never read 
secrets: we just skip adding the where clause when building the query, rather 
than the previous approach of putting in the year zero using string 
substitution.

To get an instance of $user to work with, I had to add an extra call to 
User::get(), which is temporary. This change is backed out in patch #9.

#8: This adds a many-to-many relationship between the accounts table and the 
avail_links table in the ORM, using opt_links as the join table, and uses it to 
populate plans pages.

The opt_links table was previously mis-described by the existing Doctrine 
class, which specified a nonexistent 'id' column as the primary key. There is 
in fact no primary key on this table, but doctrine needs one, and a dual 
primary key on the two remaining columns would be appropriate. I made this 
change in the schema, and added a database migration that will convert the 
database.

In my testing it seems like the migration is completely unnecessary so long as 
you lie to Doctrine in the schema and say that both columns are primary keys. 
This may not be true for the real database.

I watched the MySQL general query log after applying this patch to confirm that 
there is no change in the total number of queries made when loading a plans 
page. However the query that retrieves the links retrieves one column (the link 
description string) which is unused. I don't think this is a huge deal 
performance wise, but correct me if I am wrong.

#9: Functions private to functions-display.php have previously passed $idcookie 
among themselves, as an integer. Now that the raw SQL queries have been 
replaced by calls to methods on the Accounts model, it makes more sense to pass 
around a $user instance. This enables me to remove a superfluous call to 
User::get() that I added earlier.

Original comment by umhecbaa on 2 Sep 2013 at 12:17

Attachments: