openSNP / snpr

The sources of the openSNP website
http://opensnp.org
MIT License
174 stars 46 forks source link

"Other users" count of SNPs is broken #289

Closed philippbayer closed 8 years ago

philippbayer commented 8 years ago

See for example https://opensnp.org/snps/i6000211#users where it says -10

In the database:

Snp.find_by_name("i6000211").users.length => 616

So it's not broken there.

gedankenstuecke commented 8 years ago

Okay, I spent an hour or so investigating this and now I'm thoroughly confused. As @philippbayer points out:

Snp.find_by_name("i6000211").users.length
=> 616

when ran from the Rails console. While the front-end displays a boring -10. But if you look at the complete object in the console:

Snp.find_by_name("i6000211")
=> #<Snp id: 2515200, name: "i6000211", position: "4277", chromosome: "MT", genotype_frequency: {}, allele_frequency: {"A"=>0, "T"=>0, "G"=>0, "C"=>0}, ranking: 0, number_of_users: 0, mendeley_updated: "2011-08-24 03:44:32", plos_updated: "2016-05-28 10:27:43", snpedia_updated: "2011-08-24 03:44:32", created_at: "2014-09-11 06:21:55", updated_at: "2014-09-11 06:21:55", user_snps_count: -10>

Note that there's an attribute for user_snps_count: -10, and in the view app/views/snps/show.html.erb we're using @snp.user_snps.size. If you run

Snp.find_by_name("i6000211").user_snps.size
=> -10

Which is weird. I couldn't exactly see how the two are connected so far. But I remembered that we introduced this behaviour as a way to cache the numbers to make the access to the pages faster. This was done with db/migrate/20140221060607_add_user_snps_count.rb There we also do the initial calculations. Unfortunately I couldn't find out…

  1. how we increase the numbers in our daily routines,…
  2. and how we decrease them?!

Maybe this initial forensics helps to find the error.

gedankenstuecke commented 8 years ago

Guess the label wasn't as appropriate as I thought 😂

philippbayer commented 8 years ago

Aah this is a Rails size vs. length vs. count thing!

irb(main):001:0> Snp.find_by_name("i6000211").user_snps.size
=> -10
irb(main):002:0> Snp.find_by_name("i6000211").user_snps.length
=> 620
irb(main):003:0> Snp.find_by_name("i6000211").user_snps.count 
=> 620

Now that's a fun problem. count and length recompute the value (count via SQL COUNT, length via checking the length of the Ruby array), size reads from the appropriate column. As a workaround you could replace it with count but it's a tiny bit more computationally intensive.

To fix it properly you should probably add a :counter_cache to have the column automatically updated once a new UserSNP is created: http://guides.rubyonrails.org/association_basics.html#counter-cache

Edit: Wait, the user_snp model already has a counter-cache! So that one isn't properly getting updated, possibly because we're not creating user_snps the way the counter-cache wants them to be added? I.e., similar to this comment

Edit edit: That would also explain why it's -10 - the user_snps deletion is handled the way the counter-cache expects it to be, but the user_snps creation is not since those are bulk-imported by activerecord-import - therefore, on every deletion it subtracts one, but it doesn't add one on creation, so we land at -10 with "rare" SNPs (not rare - but SNPs that already existed from before active_import have a correct-ish count because they were imported differently)

Edit edit edit: Maybe manually running the ActiveRecord callbacks during parsing fixes it: https://github.com/zdennis/activerecord-import/wiki/Callbacks

Edit edit edit edit: Running the callbacks is the reason why importing was slow in the first place, I'd rather have a cron-controlled monthly rake task or a worker who updates the column manually. All it would have to do is to replace the user_snps_count

gedankenstuecke commented 8 years ago

That sounds like a good fix to me. Have you started writing such a rake task, @philippbayer?

philippbayer commented 8 years ago

Nope, just writing papers and fixing stuff 24/7

gedankenstuecke commented 8 years ago

ok, as I'm kind of unsure how to implement the update in the easiest way, i.e. what exactly do I have to execute for each SNP in order to update the counts? 😂

philippbayer commented 8 years ago

I guess something like Snp.find_each do |snp| snp.update_attributes(user_snps_count, snp.user_snps.count) or something

gedankenstuecke commented 8 years ago

Mh, not really it seems :(

irb(main):028:0> @snp.update_attribute(:user_snps_count,@snp.user_snps.count) ActiveRecord::ActiveRecordError: user_snps_count is marked as readonly

tsujigiri commented 8 years ago

reset_counters seems to be the way to do it.

philippbayer commented 8 years ago

Oh man, for every Rails problem there seems to be a magic function somewhere :)

Could we even run this function at the end of Genotype parsing? Edit: naah, that'll take forever just to fix minor inconsistencies

gedankenstuecke commented 8 years ago

Yes, that's what I was looking for. We could just merge it into the update_frequencies rake task I think. this one should be run frequently too (it's not right now…) and we're touching each SNP there already?

tsujigiri commented 8 years ago

Parsing and deleting, I'd say. We can even update only the ones affected.

tsujigiri commented 8 years ago

Or we just update all of them. Just to be sure everything is consistent.

tsujigiri commented 8 years ago

update_frequencies also sounds good.

gedankenstuecke commented 8 years ago

Works, see e.g. rs13340445