Use Postgres instead of JSON

mplewis commented 10 years ago

Hey @tylrtrmbl! This is halfway there.

Things we need to do:

Move redirects to Postgres
Update the redirects controller to grab data from Postgres
Move server connection stuff to config.py
???
Profit!

mplewis commented 10 years ago

Psst. This closes #143.

taylortrimble commented 10 years ago

Big Big Question: now that we are moving to a DB, are we still going to cache on our side? I think if we're the only client of the DB then it's okay, but we would need to be upfront about THIS WEBSITE IS HOSTED FROM ONE APP ONLY OKAY??.

I bet we'd see performance boost (aka stay good), but we should be sure it doesn't make consistency ugly.

mplewis commented 10 years ago

I think caching is good. How do you feel about a cron job to recache every 60 seconds? Or 300, etc.

taylortrimble commented 10 years ago

Like a timer inside the app? Personally, I'd rather hit it every time or guarantee in-memory and in-db consistency

mplewis commented 10 years ago

How about: Since we're using in-app editors for DB data now, just hit an internal recache link whenever data is updated?

taylortrimble commented 10 years ago

Bingo

taylortrimble commented 10 years ago

I honestly still haven't been through this yet. Update to the control panel, etc, coming soon?

taylortrimble commented 10 years ago

Wait, this isn't the control panel PR is it? Just the rest of the data and config stuff (which might need to become config.json, I think we mentioned?)

Cool. I'll maybe test soon if I need break from Fido.

mplewis commented 10 years ago

Correct on all counts!

taylortrimble commented 10 years ago

This code and the dump I have are out of sync to a point that's annoying to fix. Can I get a recent dump?

mplewis commented 10 years ago

Added some commits. Please review and :sushi: or don't.

Tested and works on my machine.

Remind me of the current state of config.py please?

mplewis commented 10 years ago

Current (as of this second) Postgres dump is available at http://z.kesdev.com/code/1C1547083147

taylortrimble commented 10 years ago

I like everything but the stuttery ids! Not a fan of leader_id in the leader table, can just be id.

Do you also have the blank database generator, your plain text hand written db setup SQL, pushed somewhere?

On Mon, Oct 28, 2013 at 10:56 AM, mplewis notifications@github.com wrote:

Current (as of this second) Postgres dump is available at http://z.kesdev.com/code/1C1547083147

Reply to this email directly or view it on GitHub: https://github.com/teslaworksumn/teslaworks.net/pull/150#issuecomment-27224461

mplewis commented 10 years ago

Here's the current create SQL script: http://z.kesdev.com/code/2X070u2E381J I don't have this pushed somewhere yet. Suggestions as to where I should push this?

Re: stutter IDs: This is actually a good thing for semantics IMO. Check out twnet-create.sql above.

CREATE TABLE project_needs (
    project_need_id serial,
    project_id integer
        NOT NULL
        REFERENCES projects
        ON DELETE CASCADE,
    need_text text,
    display_order integer,
    PRIMARY KEY (project_need_id)
);

See the REFERENCES projects line? Since the project_needs column name project_id matches the projects column name project_id, we can shorten the command from the following:

        REFERENCES projects (project_id)

(or, if we were using just id)

        REFERENCES projects (id)

to:

        REFERENCES projects

I also like it because there's only one project_id in the database, not a couple project_ids and some ids that go with them. So the name is DB consistent.

Further reading: [http://www.postgresql.org/docs/8.2/static/ddl-constraints.html#DDL-CONSTRAINTS-FK](Postgres docs)

taylortrimble commented 10 years ago

Here's the current create SQL script: http://z.kesdev.com/code/2X070u2E381J I don't have this pushed somewhere yet. Suggestions as to where I should push this?

Let's make it part of the repo!

Re: stutter IDs: This is actually a good thing for semantics IMO.

Yeah, I'm on board with this now too.

I'll more thoroughly review now!

mplewis commented 10 years ago

Let's make it part of the repo!

Done.

taylortrimble commented 10 years ago

I'm getting missing images for Ultralight, ODI. I'm guessing this is because we haven't merged in master and you got the data from the deployed site, right?

taylortrimble commented 10 years ago

We've still got some JSON lurking around: can we move the redirects to the db too?

taylortrimble commented 10 years ago

A few ideas I want to know if you're open to:

Object Key Name <==> Table Column Name

PROJECTS_KEY_ORDER = ['id', 'name', 'slug', 'description', 'photo_url', 'past_project']
GET_PROJECTS_QUERY = 'SELECT project_id, name, slug, description, photo_url, past_project FROM projects ORDER BY display_order;'

Becomes

PROJECTS_KEY_ORDER = ['project_id', 'name', 'slug', 'description', 'photo_url', 'past_project']
GET_PROJECTS_QUERY = 'SELECT %s FROM projects ORDER BY display_order;' % ', '.join(PROJECTS_KEY_ORDER)

I think it makes things a little DRYer, easy to add columns.

Generalizing object mapping from SQL rows

This happens twice:

for attr_num in xrange(len(project_raw)):
                    key_name = PROJECTS_KEY_ORDER[attr_num]
                    project_data[key_name] = project_raw[attr_num]

And this is a little hard to understand at first (but cool):

needs = [text_holder[0] for text_holder in cur.fetchall()]

We could have something like object = make_object(obj_raw, key_order) instead that generalizes to all those cases.

Same as above, and works for collections

object = make_object_collection(objs_raw, key_order)

We could actually do this for projects, leaders, needs, AND photos. Do it for projects first, then add leaders and needs and photos to it after it's been make_object_collectioned first.

projects_data = make_object_collection(projects, key_order)
for project in projects_data:
    id = project['project_id']
    leaders_raw = # get the leaders by project id
    needs_raw = # get the leaders by project id
    photos_raw = # get the leaders by project id
    project['leaders'] = make_object_collection(photos_raw, PHOTOS_KEY_ORDER)
    project['needs'] = make_object_collection(photos_raw, PHOTOS_KEY_ORDER)
    project['photos'] = make_object_collection(photos_raw, PHOTOS_KEY_ORDER)

taylortrimble commented 10 years ago

Data storage

Past projects will need a URL to redirect to upon completion (usually a blog post). We could perhaps get rid of past_project then and test for the existence of the redirect URL.
Can we enforce that the display_order columns have unique values for each containing object for, say, project_photo relationships? I.e. Robomural doesn't have [1, 1, 3]?
create-tables.sql is my new favorite file in this repo. Pretty.

Database connection

Let's open the connection to the database and keep it open forever. Opening connections like that each time ain't good for performance.

Writing

I don't see writing to the database in this guy.

A Bug and a Can of Worms (that look like snakes)

:bug:: If we don't have any past projects, we'll hit the db every time. The check if not self.past_projects will fail and we'll reload.

:snake:: What if we didn't do caching? We are making the assumption we are the only client on this, which is true, but it's kinda bad architecture. We're basically using the db as a backup of RAM.

It also solves our bug, and with an open connection I don't think the performance would be too bad! We could A/B of course.

mplewis commented 10 years ago

Object Key Name <==> Table Column Name Generalizing object mapping from SQL rows Generalizing collection mapping from SQL rows

:+1: These are all great ideas. I'm loving DRY. What's the best way to document esoteric utility/generalizing functions like this?

:bug: No Past Projects

Handled in commit 4258475.

:snake: Caching

Check out this SO discussion.

Normally I'm in favor of caching but I don't think we'll actually run into enough speed issues to have to cache. If we cache wrong, we end up hurting our performance instead. How do you feel about putting off caching until we scale?

(Alternately, you could write a caching engine. I don't mind either way ;)

DB Writing

There isn't any yet :(

Is that still in the scope of this PR? I'm moving existing functionality to Postgres.

Redirect Slugs

I think this problem is complex enough to warrant its own PR. I'll iMessage before iCode.

mplewis commented 10 years ago

Can we enforce that the display_order columns have unique values for each containing object for, say, project_photo relationships?

I don't think it matters all that much. ORDER BY is fine with duplicates. If we write our editor right, we should never run into a situation in which we submit [1, 1, 3], etc.

I want to avoid constraints whenever not absolutely necessary. I don't like it as a dev when software tells me I can't do something because the last developer didn't think it made sense.

mplewis commented 10 years ago

Re: opening and closing a connection: We have to make the object properly destroyable.

http://stackoverflow.com/questions/865115/how-do-i-correctly-clean-up-a-python-object

I'm going to start using this pattern.

mplewis commented 10 years ago

Hey, I think making GET_PROJECTS_QUERY = 'SELECT %s FROM projects ORDER BY display_order;' may open us to injection. Can you think of a vector by which this might occur?

I'm not saying it's possible, I'm just saying using string interpolation and concatenation inside query strings is a Bad Idea.

mplewis commented 10 years ago

You know, despite those suggestions making the code DRYer, I think they complicate the code. Here's why:

We have a couple of uses of these cases. 2 of each:

attr array --> attr dict
single attr container array --> single attr array

In addition, the attr array --> attr dict is different in both cases:

for project_raw in projects: sorts projects into a dict by slug keys
for leader_raw in leaders_raw: puts leader dicts into an array, ordered by display_order

So, it's hard to DRY the attr array --> attr dict function because they're really two functions.

In addition, the single attr container array --> single attr array is only two lines, and there are only two of them. And my personal guideline for DRY is:

3 or more = DRY it
2 or less = maybe

Wahtchu think?

Ima keep plugging at this. I think there are breakthroughs to be made.

mplewis commented 10 years ago

On XOR slug constraints: http://stackoverflow.com/questions/960556/sql-how-to-enforce-that-only-a-single-column-is-set-in-a-group-of-columns

taylortrimble commented 10 years ago

Normally I'm in favor of caching but I don't think we'll actually run into enough speed issues to have to cache.

I agree, right now I think we are caching though and I want to nuke it! We "load". I think we should query the each time! Otherwise, we're unintentionally cacheing. Whoops.

taylortrimble commented 10 years ago

Is that still in the scope of this PR?

Nope, nvm

mplewis commented 10 years ago

I'm down for killing caching. I like querying each time.

taylortrimble commented 10 years ago

Okay, #161 takes care of caching

taylortrimble commented 10 years ago

Our green merge button is back! Also, you have to pull. Thanks.

taylortrimble commented 10 years ago

After #158 and #163 I consider this ready to go. Good, good work! Thanks for getting it started. :beers:

mplewis commented 10 years ago

here goes nothing. :key:

teslaworksumn / teslaworks.net