internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.1k stars 1.33k forks source link

Cache get_avatar_url on /followers page to avoid fetching every user full object #9282

Open mekarpeles opened 4 months ago

mekarpeles commented 4 months ago

Currently, the My Followers & Following pages are making individual requests for every patron account on the page in order to fetch the internetarchive identifier required to craft their avatar url. Ideally, we'd be able to fetch these in bulk. At minimum, we should cache these values so they are more easily accessible.

https://openlibrary.org/people/mekBot/followers https://openlibrary.org/people/mekBot/following

Screenshot 2024-05-16 at 6 25 34 AM

See: https://github.com/internetarchive/openlibrary/pull/8607/files#r1496013460

RayBB commented 4 months ago

@mekarpeles I don't want to scope creep this. But I've also talked with @cdrini about a bulk api for the works merge page as well. We need bulk ratings/bookshelves/lists/editions (so we can make fewer requests when doing a big merge). Making so many requests now currently seems to cause ratelimiting problems for librarians and puts them in the "slow lane".

Perhaps a more general solution should be considered when approaching this.

mekarpeles commented 4 months ago

@RayBB We already have ways of e.g. using ctx fetch_many, this is a specific case where we just need to change the code powering this API to make a single call for a batch of objects rather than a handful of individual calls.

Vidyap23 commented 3 months ago

@RayBB could you please assign this to me?

RayBB commented 3 months ago

@Vidyap23 you're assigned. If you have questions be sure to tag Mek as he has the context on this one

Vidyap23 commented 3 months ago

@mekarpeles I am currently working on this and after going through a few files where values are being cached, I am using memcache_memoize to cache the response. Test code below

def GET(self, username):
        def fetch_cached_avatar(username):
            print(username, "inside function avatar", flush= True)
            url = 'https://picsum.photos/id/237/200/300'
            # url = User.get_avatar_url(username)
            return url
        url = cache.memcache_memoize(
        fetch_cached_avatar,
        'user.avatar',
        timeout= 5 * dateutil.HOUR_SECS,
        )(username)
        raise web.seeother(url)

Is this the right approach since I am pretty new to the codebase and I am not sure how caching works with respect to singular requests in this backend. I was able to test this by adding followers and it looks like it was working when I disabled the browser cache too. Is there any other way to test this?

mekarpeles commented 3 months ago

@Vidyap23 Yes! This is a reasonable approach.

mekarpeles commented 2 months ago

@Vidyap23 did you have any other questions about this one?