Closed GoogleCodeExporter closed 8 years ago
Hi Arik,
You raise some interesting points here. The memcache implementation was,
admittedly, rather rushed.
Most users, I believe, are hitting memcache. I believe this for several reasons:
1. I believe most users aren't running the extension without actually sending a
link, so memcache generally has something to store.
2. According to the documentation
(http://code.google.com/appengine/docs/python/memcache/overview.html#How_Cached_
Data_Expires) memcache won't evict data until it HAS to, for memory reasons.
Given what's being stored and how many copies of it, memcache should only
rarely be evicting things, and only the least recently used ones, at that.
3. CPU for the /get URL (which is what serves links to the extension) is
currently averaging 46 milliseconds per response. In comparison, /add (which is
how links are added) is averaging 162 milliseconds.
That said, I'm interested in your perceived gains. You argue for a key based
query, which is indeed cheaper, though I'm curious as to how you're assuming
I'd get the key. Are you saying I should use User IDs as keynames? That's
possible, and holds merit. And key-based queries are certainly cheaper, in
terms of CPU. The problem is, by adding that second model (Authors), I'm
tacking on another datastore operation to the /add request. Which will increase
the time that takes--something that can't be memcached, as well. Furthermore,
I'd need to convert the links currently in the database to the new model
dynamically, to make sure nobody lost their links, and that would take some
additional CPU for the transitional period. While there are definite gains
here, I think they're negligible enough that a rewrite does not make sense at
this point, especially with the current rewrite for the Channel API already
well under way (and implementing a lot of this, including the Author model. :))
Your line of questioning has, however, opened up another eventuality-- how
memcache is populated. Currently, memcache is only populated when a user adds a
link. But if someone just leaves the extension running, for days at a time (I
know I've done this) and hasn't sent a link in a while, they could be running
these costly datastore queries every fifteen seconds for days. What I *should*
be doing is checking with memcache, failing back to the database, and the
populating memcache no matter what-- that way, the datastore can be skipped as
much as possible. This is something that can be accomplished in a few lines of
code, so is probably worth the time it will take to implement.
I'd love to hear your ideas on this, and would really enjoy some feedback from
you on the nascent server being built that incorporates the Channel API-- not a
whole lot of optimisation work has been done on it yet, and it sounds like you
have some good ideas in that area.
Original comment by foran.pa...@gmail.com
on 7 Sep 2010 at 6:09
OK, this is a bit out of order, but I will try to address all of your
questions/comments :
1. Your assumption that everyone send a link for the first time makes sense,
but you need to take into account that many of the new users didn't manage to
setup authentication correctly on the Android device, which will result in many
users who haven't send a link. And all of them have the extension, and they
probably didn't bother to remove it and it pings your server.
Anyway, I think it is easy to verify. For starters, you can look if the /get
URL CPU usage is high. If it is, it probably queries the datastore. If not,
ignore what I said and concentrate on the new version :) (are you using the
profiling with AppStats?
http://code.google.com/appengine/docs/python/tools/appstats.html)
And you're correct that a quick fix for this will be to update the Memcache
after you do the query. Also, you can populate Memcache key when the user
registers for the first time.
Both of these actions will reduce the amount of queries without implementing
new model. While I don't think the migration will be that complicated, it sure
is better to implement the minimum needed fix and concentrate on the new
version.
2. Yes, I think that using the user id as a keyname is the right way to go. As
for your concern about complicating the write process and making it more CPU
intensive -- you shouldn't worry about this, because you have far less writes
than reads. So you better optimize reads rather than writes.
3. Another note about Memcache -- remember that it might be purged even when
there is enough memory (for maintenance reasons), therefore take into account
what happens if your Memcache is empty. This is another reason to optimize for
key based queries.
4. I will be happy to give feedback on the new server code -- is it in the
repository already?
Original comment by arik...@gmail.com
on 7 Sep 2010 at 6:56
1. I am not using appstats yet. I wonder what the performance implications of
it are... I'm simply profiling based on the CPU usage data App Engine provides
for me.
2. I think you're right. And you're also right about there being more reads
than writes (right now...). I just grow cautious of loading up the bottleneck
on an area that's harder to optimise.
3. My treatment of memcache is that it's a quick, speedy database I can't
always rely on to have information. :) The app is definitely improved by
speeding up every portion of it, because any single point of memcache may or
may not fail. That's why I try to be careful and check to make sure I'm falling
back to the datastore whenever memcache fails.
4. I'd love any insight you may have. If you'd like to hit me up on IM (my
GTalk username is my Google Code username followed by @gmail.com) I'll talk you
through the new structure of the server, the vision it was built with, and
point you towards the repository with the proof-of-concept. :)
Original comment by foran.pa...@gmail.com
on 7 Sep 2010 at 7:19
Original comment by foran.pa...@gmail.com
on 28 Dec 2010 at 7:27
Original issue reported on code.google.com by
arik...@gmail.com
on 7 Sep 2010 at 8:14