Closed philippbayer closed 8 years ago
Okay, I spent an hour or so investigating this and now I'm thoroughly confused. As @philippbayer points out:
Snp.find_by_name("i6000211").users.length
=> 616
when ran from the Rails console. While the front-end displays a boring -10
. But if you look at the complete object in the console:
Snp.find_by_name("i6000211")
=> #<Snp id: 2515200, name: "i6000211", position: "4277", chromosome: "MT", genotype_frequency: {}, allele_frequency: {"A"=>0, "T"=>0, "G"=>0, "C"=>0}, ranking: 0, number_of_users: 0, mendeley_updated: "2011-08-24 03:44:32", plos_updated: "2016-05-28 10:27:43", snpedia_updated: "2011-08-24 03:44:32", created_at: "2014-09-11 06:21:55", updated_at: "2014-09-11 06:21:55", user_snps_count: -10>
Note that there's an attribute for user_snps_count: -10
, and in the view app/views/snps/show.html.erb
we're using @snp.user_snps.size
. If you run
Snp.find_by_name("i6000211").user_snps.size
=> -10
Which is weird. I couldn't exactly see how the two are connected so far. But I remembered that we introduced this behaviour as a way to cache the numbers to make the access to the pages faster. This was done with db/migrate/20140221060607_add_user_snps_count.rb
There we also do the initial calculations. Unfortunately I couldn't find out…
Maybe this initial forensics helps to find the error.
Guess the label wasn't as appropriate as I thought 😂
Aah this is a Rails size vs. length vs. count thing!
irb(main):001:0> Snp.find_by_name("i6000211").user_snps.size
=> -10
irb(main):002:0> Snp.find_by_name("i6000211").user_snps.length
=> 620
irb(main):003:0> Snp.find_by_name("i6000211").user_snps.count
=> 620
Now that's a fun problem. count
and length
recompute the value (count
via SQL COUNT, length
via checking the length of the Ruby array), size
reads from the appropriate column. As a workaround you could replace it with count
but it's a tiny bit more computationally intensive.
To fix it properly you should probably add a :counter_cache
to have the column automatically updated once a new UserSNP is created: http://guides.rubyonrails.org/association_basics.html#counter-cache
Edit: Wait, the user_snp model already has a counter-cache! So that one isn't properly getting updated, possibly because we're not creating user_snps the way the counter-cache wants them to be added? I.e., similar to this comment
Edit edit: That would also explain why it's -10 - the user_snps deletion is handled the way the counter-cache expects it to be, but the user_snps creation is not since those are bulk-imported by activerecord-import - therefore, on every deletion it subtracts one, but it doesn't add one on creation, so we land at -10 with "rare" SNPs (not rare - but SNPs that already existed from before active_import have a correct-ish count because they were imported differently)
Edit edit edit: Maybe manually running the ActiveRecord callbacks during parsing fixes it: https://github.com/zdennis/activerecord-import/wiki/Callbacks
Edit edit edit edit: Running the callbacks is the reason why importing was slow in the first place, I'd rather have a cron-controlled monthly rake task or a worker who updates the column manually. All it would have to do is to replace the user_snps_count
That sounds like a good fix to me. Have you started writing such a rake task, @philippbayer?
Nope, just writing papers and fixing stuff 24/7
ok, as I'm kind of unsure how to implement the update in the easiest way, i.e. what exactly do I have to execute for each SNP in order to update the counts? 😂
I guess something like Snp.find_each do |snp| snp.update_attributes(user_snps_count, snp.user_snps.count)
or something
Mh, not really it seems :(
irb(main):028:0> @snp.update_attribute(:user_snps_count,@snp.user_snps.count) ActiveRecord::ActiveRecordError: user_snps_count is marked as readonly
reset_counters
seems to be the way to do it.
Oh man, for every Rails problem there seems to be a magic function somewhere :)
Could we even run this function at the end of Genotype parsing? Edit: naah, that'll take forever just to fix minor inconsistencies
Yes, that's what I was looking for. We could just merge it into the update_frequencies rake task I think. this one should be run frequently too (it's not right now…) and we're touching each SNP there already?
Parsing and deleting, I'd say. We can even update only the ones affected.
Or we just update all of them. Just to be sure everything is consistent.
update_frequencies
also sounds good.
Works, see e.g. rs13340445
See for example https://opensnp.org/snps/i6000211#users where it says -10
In the database:
So it's not broken there.