Data Caching - Githubissues

bloc97 commented 7 years ago

I couldn't find an easy way to cache the data received from this API, so I've made (is still making) a wrapper that wraps this API.
Cached data, even when the data is from the STATIC V3 API, can be accessed repeatedly (thousands of times per second) without slowdowns. And it will be possible to implement native functions (such as searching and sorting) within the Wrapper without relying on the speed of Riot's GET Api.

https://github.com/bloc97/riot-api-java-cached/

It is still WIP but almost all the functions are available for caching.
Only the tournament methods are not cached.

Maybe someday it would be possible to merge the two projects?

Linnun commented 7 years ago

We had the topic of caching come up a few times already, but maybe you can add something new to the discussion.

If we wanted to cache API calls, there are alot of open questions. Probably the biggest of all would be:

Where to cache data? You could cache data in memory, but that would quickly become an enormous memory eater. Just take the matches endpoint for example, I just picked a random match from the Api and get a result that is 32261 characters in size, which approximately equals ~32 KB, assuming you cache the json String. If you were to cache the actual DTOs, it would probably be much bigger. It's not unusual that projects request hundreds of thousands or even millions of matches to calculate averages for certain things -- that would quickly sum up many GB of cache and end up in a java.lang.OutOfMemoryError and crash the application on many low-tier servers. Of course you could provide options to limit the cache size or allow uncached calls to go around this. Alternatively you could store the cache on disk, maybe create a "cache" folder and store files there for each cached call. The big advantage is that your memory keeps clean and your cache survives a restart of the application. The downsides are that this would slow down the server, if it has to open and close files constantly. Also depending on your system, you might eventually run into "Too many open files" errors. You could also cache your calls in third party databases or third party caching tools like memcached -- and if your project gets too big to run on a single dedicated server you eventually have to use a dedicated caching option, since local caching kinda gets pointless when running multiple instances. Each option has their advantages and disadvantages -- so what would we choose to provide with this api wrapper? Basically there is no general right or wrong, it simply depends on the specific use case of the developers using this api wrapper.

I personally belong to the latter category -- I have projects too big to be run on a single dedicated server, so a local cache is not really going to help my use case. Instead I use a memcached server for short-term caching and a few mariadb database servers for long-term caching.

There are many projects dedicated to caching data for tons of different use cases. I think every developer should decide on their own what caching method suits their needs. I personally don't believe there is the one caching method that's right for everyone, so I think it would be wrong to force one caching method upon everyone using this api wrapper. Also with the many caching options already being offered by other dedicated projects, it would kinda be like reinventing the wheel.

What are your thoughts on this?

bloc97 commented 7 years ago

Well at first this is just a simple caching method for those small users who frequently access the same data, and it makes the code much cleaner. Instead of storing all the objects you need after calling once the Riot API, you just repeatedly call getSummonerByName for example. Then in your code there would be much less variables. Also, since my project is currently just a naive way to cache data, I can't really talk about memory efficiency. I could easily add in a cache size limit, since we know the average size of all the DTOs, and I just have to limit the map sizes for each DTO type.

I'm more for offering easier options for developers than forcing them to use something. People can use both cached and uncached methods.

Caching on disk would be a bit overkill in my opinion, as the rate limit imposed by Riot are not that limiting, especially for the application APIs. Maybe it would be somewhat useful for the developer API limit, but I still don't really see a use for it, a limited memory cache would be the best way to go in my opinion, as those who request millions of matches can figure a way to cache their own information (eg. cache the win ratio and use it to recalculate using only new matches instead of going through the list again).

Also, do you have some average usage/memory figures for common applications? Then we can figure out a way to deal with the extreme use case scenarios.

bloc97 commented 7 years ago

After considering extreme use scenarios, I think a hybrid method would be best (storing common data in memory and all the rest on disk), but then, as you stated, there are Database libraries that already do this. I don't think there is a need to overcomplicate things. Most people just need a simple API, that does things in a simple way (so it doesn't always break), just like what this api provides. riot-api-java is a very lightweight wrapper, and that is what I want to respect too with riot-api-java-cached. Keeping things simple is usually the best way to go.

bloc97 commented 7 years ago

For now, I will add a "purge" method in my API, so people can use a timer or purge the cache by themselves, then I will figure out an automatic way of clearing the cache or keeping the cache small.

Thanks for reminding me of those extreme use scenarios, I always seem to forget them...

sergeknystautas commented 7 years ago

I would suggest the library define an interface for caching and with one (or maybe a couple) implementations, and then let others come up with other implementations. Memory, disk, memcache, redis, JDBC, so many ways people might want to store and expire the content.

On Wed, May 10, 2017 at 10:11 AM, bloc97 notifications@github.com wrote:

After considering extreme use scenarios, I think a hybrid method would be best (storing common data in memory and all the rest on disk), but then, as you stated, there are Database libraries that already do this. I don't think there is a need to overcomplicate things. Most people just need a simple API, that does things in a simple way (so it doesn't always break), just like what this api provides. riot-api-java is a very lightweight wrapper, and that is what I want to respect too with riot-api-java-cached. Keeping things simple is usually the best way to go.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/taycaldwell/riot-api-java/issues/98#issuecomment-300494151, or mute the thread https://github.com/notifications/unsubscribe-auth/AArHBiRWSFIvW2uyrLZVOMXff7zRbU48ks5r4cV3gaJpZM4NVzwk .

-- Serge Knystautas PrestoSports Bringing Sports Information Online serge@prestosports.com 301-789-1807 http://www.prestosports.com

taycaldwell / riot-api-java

Data Caching #98