jbholden / cdcpool_google

0 stars 1 forks source link

Need to modify database to use a strong consistency model #23

Closed jbholden closed 9 years ago

jbholden commented 10 years ago

The current way we are causing the database to use a strong consistency model is to add the following command line option to google app engine.

--datastore_consistency_policy consistent

However, I don't believe this will work on the actual production website. The production website uses a "High Replication" model. I believe this means that it is optimized for reading data and ensuring that data does not get lost while sacrificing the availability of recently written data.

The code needs to be changed to support strong consistency (write followed by read gets back data written) as described in the following article.

https://developers.google.com/appengine/docs/python/datastore/structuring_for_strong_consistency

I made an attempt to implement this in the branch brent_develop, but when I tried to load the database it got hung in some sort of lock scenario.

A couple of possible fixes I have thought of that might work:

blreams commented 10 years ago

Question: Does the change to strong consistency policy only break when you are doing the old style database loads? IOW, if I use the new load database http, I won't get the hangs that we've seen with consistent policy?

jbholden commented 10 years ago

The current code should behave in the same way as before. You need to specify the --datastore_consistency_policy consistent argument. This is true for both the new way and old way of loading the database. You should not encounter any hangs with this code.

The problem is when the site is deployed to cdcpooltest.appspot.com. The production website does not support the argument given above, and therefore the code must be changed to work on the deployed site.

In summary, you can continue in the same way when running the local development server, but if you deploy it to the site then you will have issues on the main site.

I created a test branch to try and see if I could fix the strong consistency issue for the main site, but ran into a hang while trying to load the database. This branch has not been merged with the main branch, therefore you should not see a hang.

jbholden commented 9 years ago

I believe that I have a fix for this and will document what was done here. The basic idea for how to implement strong consistency is to provide a parent when creating a datastore entity and to include this parent in database queries (using ANCESTOR).

Attempt 1: Single Root (did not work)

The first attempt I made at doing this was to create 1 single root, and then all of the other entities would use this root. This turned out to have very bad performance. As the number of entities in the database grew, accessing the data in the database became exponentially slower.

The main culprit here was the number of Picks. For 2013 there were 550 Picks per week, for a total of about 7150 picks per year. If 2012 had about 7150 picks too, then with a single root, there would be 14300 picks under this root.

Attempt 2: Use multiple roots (appears to work)

In this attempt, I used multiple roots which will prevent the number of entities under a root from getting too large, therefore the performance issues was not seen.

Using this method, there would 550 picks under a 2013 week 1 root, 550 picks under a 2013 week 2 root, 550 picks under a 2013 week 3 root, etc... This method reduces the number of picks under a root to 550 instead of 14300 as described in attempt 1.

What you need to know

The following method should be used going forward

Examples

from models.root import *

# create a player entity, notice the parent parameter
player = Player(name=name,years=years,parent=root_players())
player.put()

# perform a query, notice the ANCESTOR parameter
 players_query = db.GqlQuery('select * from Player where ANCESTOR is :ancestor',ancestor=root_players())

# create a key from the ID, notice the parent parameter
player_key = db.Key.from_path('Player',player_id,parent=root_players())

Commit

The code committed to github with hash code c35501f5176600c4e11c9606f2dda61170767235

jbholden commented 9 years ago

One other note, you can remove this command line option now. --datastore_consistency_policy consistent