precisely / curious2

0 stars 0 forks source link

23andMe data import is still too large for DB #174

Closed syntheticzero closed 10 years ago

syntheticzero commented 10 years ago

The chunking code for 23andMe data is still using chunks that are too large for the default MySQL settings.

I'll go in and change the chunk size myself.

@SAgrawal14 @visheshd

syntheticzero commented 10 years ago

I fixed the chunk size but now there's a unique constraint violation. Again --- come on guys, we need to always test code.

visheshd commented 10 years ago

It doesn't seem like it but we do test things. I agree completely that the process could be improved, but to be fair some of these issues are impossible to detect on a development env because it is not the same as prod.

I would suggest we setup a cloud server for prod and clone it to setup a staging server for a final testing

On Wednesday, February 12, 2014, syntheticzero notifications@github.com wrote:

I fixed the chunk size but now there's a unique constraint violation. Again --- come on guys, we need to always test code.

Reply to this email directly or view it on GitHubhttps://github.com/syntheticzero/curious2/issues/174#issuecomment-34896354 .

Best, Vishesh

---- Message Disclaimer -----

This e-mail message is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If you are not the intended recipient, any dissemination, distribution or copying of this communication is strictly prohibited.

syntheticzero commented 10 years ago

I agree, but unique constraint violation would have happened no matter what the MySQL config. This means the code never worked even once. I understand the temptation to just push code when it seems like it "must" be right --- but we have to stop doing this. Any code that hasn't run at least once should not be submitted for review.

However, you are right that we need a QA server, I'll work on that.

syntheticzero commented 10 years ago

--- in this case, for instance, even though getting a 23andMe interaction done in a dev environment is difficult, we should at least have tested the service method by simulating some data larger than the chunk size and run it through an integration test. We have to start doing this --- EVERY piece of code needs to run at least once before we commit, going forward.

sagrawal31 commented 10 years ago

There was a unique constraint on profileId in Twenty3AndMeData domain, which was removed during chunck feature implementation. Kindly have a look here.

For testing in local environment, I'd removed the unique constraint on my local database manually and missed to write a migration later. I tested it with chunking in 1MB & 0.5 MB data, since API returns a data of about 1.2 MB.

That's why this happened.

syntheticzero commented 10 years ago

Okay. The rule we can adopt from this is: never manually remove constraints on the DB. Always use a migration, so that you can be certain you won't forget to write the migration.

sagrawal31 commented 10 years ago

Okay, I'll definitely follow this.

syntheticzero commented 10 years ago

Cool. Another thing I noticed is we should be adding indexes to tables when we want to look things up. For instance, I want to have an index on profile_id to speed up lookups of data.

In general, when creating a new domain class, you should add an index on all fields used for searches/lookups.

The convention I'm using is:

static mapping = {
    profileId column:'profile_id', index:'profile_id_index'
}

i.e. name the index '..._index'

Thanks.

syntheticzero commented 10 years ago

Deployed, tested, works.