ninya-io / ninya.io

Find StackOverflow users near you by tags and reputation
MIT License
60 stars 13 forks source link

Migrate to postgres #13

Closed cburgdorf closed 10 years ago

cburgdorf commented 10 years ago

Postgres is the new (old!) shiny kid from the block and it combines best of both worlds (NoSQL, relational).

We decided to migrate over.

cburgdorf commented 10 years ago

Our user objects look like this:

{
   "_id": "999820",
   "_rev": "1-c3a12e530e99162328bd0892a2f12db3",
   "user_id": 999820,
   "user_type": "registered",
   "creation_date": 1318878509,
   "display_name": "Marcio Simao",
   "reputation": 1748,
   "email_hash": "1e4804720842cb4cb6181e91d80a5b1c",
   "last_access_date": 1382130557,
   "website_url": "",
   "location": "Brazil",
   "about_me": "<p>Web developer interested in the following technologies: <br />\nPHP, PostgreSQL and JQuery</p>\n",
   "question_count": 29,
   "answer_count": 86,
   "view_count": 98,
   "up_vote_count": 470,
   "down_vote_count": 4,
   "accept_rate": 100,
   "association_id": "637eb981-4f71-4d04-a1aa-66d3e1f1e5ce",
   "user_questions_url": "/users/999820/questions",
   "user_answers_url": "/users/999820/answers",
   "user_favorites_url": "/users/999820/favorites",
   "user_tags_url": "/users/999820/tags",
   "user_badges_url": "/users/999820/badges",
   "user_timeline_url": "/users/999820/timeline",
   "user_mentioned_url": "/users/999820/mentioned",
   "user_comments_url": "/users/999820/comments",
   "user_reputation_url": "/users/999820/reputation",
   "badge_counts": {
       "gold": 0,
       "silver": 8,
       "bronze": 21
   },
   "top_tags": [
       {
           "tag_name": "php",
           "question_score": 14,
           "question_count": 9,
           "answer_score": 54,
           "answer_count": 62
       },
       {
           "tag_name": "javascript",
           "question_score": 5,
           "question_count": 7,
           "answer_score": 28,
           "answer_count": 11
       },
       {
           "tag_name": "css",
           "question_score": 1,
           "question_count": 2,
           "answer_score": 27,
           "answer_count": 17
       },
       {
           "tag_name": "html",
           "question_score": 3,
           "question_count": 2,
           "answer_score": 25,
           "answer_count": 16
       },
       {
           "tag_name": "mysql",
           "question_score": 0,
           "question_count": 0,
           "answer_score": 9,
           "answer_count": 10
       },
       {
           "tag_name": "google",
           "question_score": 0,
           "question_count": 0,
           "answer_score": 9,
           "answer_count": 1
       },
       {
           "tag_name": "positioning",
           "question_score": 0,
           "question_count": 0,
           "answer_score": 9,
           "answer_count": 1
       },
       {
           "tag_name": "jquery",
           "question_score": 16,
           "question_count": 11,
           "answer_score": 8,
           "answer_count": 11
       },
       {
           "tag_name": "string",
           "question_score": 0,
           "question_count": 0,
           "answer_score": 7,
           "answer_count": 5
       },
       {
           "tag_name": "types",
           "question_score": 0,
           "question_count": 0,
           "answer_score": 6,
           "answer_count": 1
       },
       {
           "tag_name": "phpjs",
           "question_score": 0,
           "question_count": 0,
           "answer_score": 5,
           "answer_count": 1
       },
       {
           "tag_name": "session",
           "question_score": 0,
           "question_count": 0,
           "answer_score": 5,
           "answer_count": 4
       },
       {
           "tag_name": "inline",
           "question_score": 0,
           "question_count": 0,
           "answer_score": 5,
           "answer_count": 2
       },
       {
           "tag_name": "compare",
           "question_score": 0,
           "question_count": 0,
           "answer_score": 5,
           "answer_count": 1
       },
       {
           "tag_name": "div",
           "question_score": 0,
           "question_count": 0,
           "answer_score": 4,
           "answer_count": 4
       },
       {
           "tag_name": "html5",
           "question_score": 0,
           "question_count": 0,
           "answer_score": 4,
           "answer_count": 1
       },
       {
           "tag_name": "cookies",
           "question_score": 0,
           "question_count": 0,
           "answer_score": 3,
           "answer_count": 2
       },
       {
           "tag_name": "forms",
           "question_score": 0,
           "question_count": 0,
           "answer_score": 3,
           "answer_count": 3
       },
       {
           "tag_name": "css3",
           "question_score": 0,
           "question_count": 0,
           "answer_score": 3,
           "answer_count": 2
       },
       {
           "tag_name": "multipage",
           "question_score": 0,
           "question_count": 0,
           "answer_score": 3,
           "answer_count": 1
       },
       {
           "tag_name": "http",
           "question_score": 0,
           "question_count": 0,
           "answer_score": 3,
           "answer_count": 1
       },
       {
           "tag_name": "setcookie",
           "question_score": 0,
           "question_count": 0,
           "answer_score": 3,
           "answer_count": 1
       },
       {
           "tag_name": "appendto",
           "question_score": 0,
           "question_count": 0,
           "answer_score": 2,
           "answer_count": 1
       },
       {
           "tag_name": "append",
           "question_score": 0,
           "question_count": 0,
           "answer_score": 2,
           "answer_count": 1
       },
       {
           "tag_name": "postgresql",
           "question_score": 7,
           "question_count": 5,
           "answer_score": 2,
           "answer_count": 2
       },
       {
           "tag_name": "button",
           "question_score": 0,
           "question_count": 0,
           "answer_score": 2,
           "answer_count": 2
       },
       {
           "tag_name": "ajax",
           "question_score": 0,
           "question_count": 0,
           "answer_score": 2,
           "answer_count": 3
       },
       {
           "tag_name": "sql",
           "question_score": 5,
           "question_count": 5,
           "answer_score": 2,
           "answer_count": 3
       },
       {
           "tag_name": "overflow",
           "question_score": 0,
           "question_count": 0,
           "answer_score": 2,
           "answer_count": 1
       },
       {
           "tag_name": "bug-tracking",
           "question_score": 0,
           "question_count": 0,
           "answer_score": 2,
           "answer_count": 1
       }
   ]
}

I now created a users table with only one field (JSON) user. I created a UNIQUE INDEX for the _id property of the user.

Now without introducing any additional fields on the user table we can already run our most complex search like this:

SELECT "user"->>'_id' as _id, "user"->>'location' as location, "user"->>'top_tags' as top_tags from users WHERE 
    (
        'layout' in (Select value->>'tag_name' FROM json_array_elements("user"->'top_tags')) OR
        'android' in (Select value->>'tag_name' FROM json_array_elements("user"->'top_tags'))
    ) AND
    (
        "user"->>'location' LIKE '%Germany%' OR 
        "user"->>'location' LIKE '%WI%' 
    )

image

However, this isn't tuned for performance yet. @basteln3rk already invested a bit deeper into that field and ended up using a dedicated property on the user table for the tags.

cburgdorf commented 10 years ago

I was able to speed up the query by using the technique described here:

http://stackoverflow.com/questions/18404055/index-for-finding-an-element-in-a-json-array

So, the query now looks like this:

image

What's interesting is that I'm still entirely schemaless. All I have is just one single json column :)

cburgdorf commented 10 years ago

It's migrated. It's fast. I'm happy :)