larsga / whazzup

Automatically exported from code.google.com/p/whazzup
0 stars 0 forks source link

Speed up RecalculateSubscription #39

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
This is one of the biggest obstacles to scaling. Need to analyze it to see 
where it actually spends the time. If it turns out that parsing the text is the 
time-waster then we should consider caching the word vector, for example by 
marshalling the hash into a database field.

Original issue reported on code.google.com by lar...@gmail.com on 26 Jul 2011 at 7:34

GoogleCodeExporter commented 8 years ago
Profiling shows that producing the term vector takes up 87% of the time. It 
should be possible to cut that time dramatically by caching the vector.

Original comment by lar...@gmail.com on 31 Jul 2011 at 10:37

GoogleCodeExporter commented 8 years ago
Caching the term vector seems to have cut the time spent by a factor of 5-7. 
Now we just need to make the cache directory configurable, and ensure that the 
cache is emptied as posts are purged. Once that's done we can deploy.

Original comment by lar...@gmail.com on 31 Jul 2011 at 10:47

GoogleCodeExporter commented 8 years ago
Now done.

Original comment by lar...@gmail.com on 31 Jul 2011 at 1:39