Closed philippbayer closed 10 years ago
Most importantly: user_snps. Just for the record. :)
Added in 8499c6a912 for user_snps - now all SNPs have an extra column of user_snps_count which is updated once a user-snp is removed or added. Note that in calculating the total number of user_snps we currently still have to iterate over all SNPs.
Not sure how to do the same for genotypings. Since counter_cache is only really useful when you have a one-to-many relationship where the "many" are truly "many", and we have mostly a one-to-one relationship between users and genotypings using counter_cache in that relation makes little sense.
I haven't run the migration on the server yet, even on my development machine with one full genotyping it took about 10 minutes to finish the migration :P
Edit: Just realized that on creation of SNP from the parsing job, the user_snps_counter is not incremented. Possibly due to the UserSnp.import trick? Fixed in 8f20d91293
I haven't run the migration on the server yet, even on my development machine with one full genotyping it took about 10 minutes to finish the migration :P
We could load a hash with { snp_name => count }
first, which would only require one query, even though it might be a long one, and update the counter caches manually. Maybe we stop Sidekiq for the time in order to not change the amount while counting and not overload the DB.
For the genotypings we would either have to implement something ourselves or find something that already does, I guess.
explain select snp_name, count(*) from user_snps group by snp_name;
Tells me that it would be an index-only scan, meaning, it might not be that slow after all.
I'm also not sure if we really should use Snp.find_each
here, or at leat increase the batch size. With a default batch size of 1000 this would be 2000 queries.
On the other hand we save two million records individually... So, who cares about 2000 more or less. :D
Can't currently run the migrations, openSNP is growing by roughly 3 genotypings per day, and Sidekiq just decided to die (no workers, had to restart) - I don't think we can wait for an empty queue, currently the parse-queue has 200 entries and I don't think that'll slow down
Apparently, adding a column without a default value doesn't take very long on Postgres, as it doesn't have to rewrite the whole table. So, if we remove the :default => 0
here, we should be fine. 0 is a wrong value anyway.
Done in 254f52515d82529f0a7940b7e0fc0c64eb406f44
All that's missing is the deployment
Related to #102
We should use counter_caches where we count things to improve speed, for example on the genotypes-index