jimburton / BigArrow

MIT License
1 stars 10 forks source link

Caching responses from the Places API #16

Open jimburton opened 11 years ago

jimburton commented 11 years ago

We should construct a way of caching all responses from the Places API, i.e. place searches, full details for a single place, and photos.

Place details and photos aren't location dependent so these should be cached in a WeakHashMap (i.e. garbage collectable) where the key is the URL that was used to retrieve it and the value is the parsed response (e.g. a PlaceDetails object) . There should a size limit on the cache, a LRU strategy for expiring contents and a means to empty the cache directly.

Place searches are location dependent. We can probably still use the URL as the key but, if everything except the location matches we need some idea of when we can reuse the results form a search that was "close enough" to the current location. In order to do this, we probably need a version of PlacesAPISearch.search that takes the current bounds of the map and works out the difference in location relative to that. Place searches that come through the list activity will be trickier, since we have to guess what a significant change in location is.

AlmasB commented 11 years ago

I'm familiarising myself with the https://github.com/thest1/LazyList, I'll try to use the same for caching info

my goals now are:

  1. load images with no tricks
  2. make view similar to wireframe
  3. cache searches

will see how it goes

jimburton commented 11 years ago

Thanks Almas, those will be really valuable contributions. As it says on the issue, the easy targets for caching are photo and details searches.

Jim

Almas notifications@github.com wrote:

I'm familiarising myself with the https://github.com/thest1/LazyList, I'll try to use the same for caching info

my goals now are:

  1. load images with no tricks
  2. make view similar to wireframe
  3. cache searches

will see how it goes

— Reply to this email directly or view it on GitHub.


This email has been scanned by MessageLabs' Email Security System on behalf of the University of Brighton. For more information see http://www.brighton.ac.uk/is/spam/



This email has been scanned by MessageLabs' Email Security System on behalf of the University of Brighton. For more information see http://www.brighton.ac.uk/is/spam/


AlmasB commented 11 years ago

So far I've got 5 threads loading images(don't know how many of them work concurrently, in theory all 5 should), since max images from google == 10 should be allright Getting working hours to work properly. I think I'll pull those as requests and only then move to caching

AlmasB commented 11 years ago

Would we want cache to be a separate class which is based on WeakHashMap? perhaps we could make it a singleton then. What do you think?

jimburton commented 11 years ago

Yes, a singleton based on WeakHashMap would make sense for in-memory caching Places and PlaceDetails. You need to set a reasonable size on the map so it doesn't get too big and store a timestamp of the storage then last access of entries so that you can make space when it gets full by removing the least-recently used entries.

Re a filesystem cache for Bitmaps, I just came across this project today: https://github.com/chrisbanes/Android-BitmapCache . Looks like a complete and nicely wrapped solution to caching images, although it adds another dependency. Can you use this?

jimburton commented 11 years ago

The key for the in-memory cache (weakhashmap) should be the querystring of the places request (no point comparing the URL) with lat/long truncated somehow so that near-enough keys/searches map to the same values, maybe just by converting to ints.

AlmasB commented 11 years ago

So the key would have querystring + for ex 50,20 (lat lng) and if we need to search 50.5 / 20.5, instead of searching we refer to map then? What min difference between new latlng and existing map key should we consider as a new search? I don't know what the value of 1 deg in metres is, but on wiki it says "1 degree of latitude on the sphere is 111.2 km or 69 miles. " I may have misunderstood something but If it is what I think it is then int truncation might not suit us We could do a string format to 3 decimals concat with querystring and then do the comparison.

jimburton commented 11 years ago

That's the idea - lAt/lng is already part of querystring for places search and we need to edit qstring before caching so that "type=bar&location=123.112,456.112" hashes to same value as "type=bar&location=123.113,456.113". To do that, reduce the detail, so the key is "type=bar&location=123.11,456.11". As you say, we need to work out how many decimal places to store to get something "near enough" but not a massive area! (My wild guess of using ints sounds way too inaccurate.) You should also strip unnecessary details out, so "type=bar&location=123.11,456.11" becomes "bar123.11456.11" or even "b123.11456.11" if only one type of place starts with b.

For place detail searches the key can be the place id.

AlmasB commented 11 years ago

Ah I see, starting work on cache today, will probably keep it as "bar&123.112,456.112" with 3 dp for the moment "&" and "," might go away eventually

AlmasB commented 11 years ago

https://github.com/AlmasB/BigArrow/blob/cache_branch/src/uk/ac/brighton/ci360/bigarrow/Cache.java

that's what it looks like now, like I said I'm keeping 3dp so x.1231, x.1232, x.1233 etc will have same value as x.123 we can always reduce it to 2dp

I set cache size to 30 key-value pairs, will need to play around with that though Will do placedetails cache next, just a thought - instead of creating another map we could add the "Cacheable" interface and just keep one map i.e. <String, Cacheable>

jimburton commented 11 years ago

This looks great Almas, pretty impressive. The only thing I'd say is that format probably isn't the responsibility of Cache, so that method should live in the place where the cache is used (another user of the cache might format differently).

To generalise the cache, a Cachable interface would make sense, or you could make PlacesList and PlaceDetails inherit from a new class CachablePlace -- which you choose depends whether you want cachable things to have some code/functionality in common. Choose inheritance if so.

To make Cache fully reusable you would use Generics, changing the type to

public class Cache<K, V> {

where K and V are the types of the keys and values, but that is a bit more complicated and might be overkill here.

AlmasB commented 11 years ago

I was only considering the specific cache while was designing it, so I'm happy to use Generics, we might actually find it pretty useful later on. So if our generic types go directly into our map <K, V> we might not even need to worry about what we're passing as K and V. Although for the sake of the cache not just being a clone of a hashmap, I could add K extends CharSequence, V extends Cachable. I will give it a thought and let you know what happens I will move the format() towards finishing the class

AlmasB commented 11 years ago

https://github.com/AlmasB/BigArrow/blob/cache_branch/src/uk/ac/brighton/ci360/bigarrow/Cache2.java

cache using generics, no longer a singleton, K can theoretically be any type as long as it's equals() is overriden and implemented correctly, however I don't think we'll need anything else than String, up to you though, tell me what you'd like

V extends Cacheable (oracle spells it with an "e", so I'll probably keep it too), I can't think of any shared functionality at the moment so interface fits nicely

AlmasB commented 11 years ago

update

commit preview: https://github.com/AlmasB/BigArrow/commit/33270fc91248ebee927bc150e8693c64997097f4 Cache.java: https://github.com/AlmasB/BigArrow/blob/cache_branch/src/uk/ac/brighton/ci360/bigarrow/Cache.java Formatter.java: https://github.com/AlmasB/BigArrow/blob/cache_branch/src/uk/ac/brighton/ci360/bigarrow/Formatter.java

changed K, V to String, Cacheable V Like you advised I moved format() away from Cache, now everyone can create their own format() by overriding it in Formatter

Will do some tests next week when I'm back in the UK

jimburton commented 11 years ago

Hi, Formatter would be better as an interface, I think. As you know, a class can only extend one superclass, so this introduces a unnecessary restriction on classes that need to do formatting. On the other hand, a class can implement as many interfaces as necessary, so this is more flexible.

AlmasB commented 11 years ago

Sure, no problem

p.s. Aidan mentioned about distributed computing projects this year, will be interested in participating in one when I got spare time

jimburton commented 11 years ago

I should have mentioned this before, but you should always try to ensure that any hash-based data structure (HashMap, HashTree, etc) never gets more than about two thirds full. The ratio of the size of the data structure to its max capacity is called the load ratio -- we need to ensure the load ratio doesn't exceed, for instance, 0.6. Actually, the Java hash structures use a load ratio of 0.75. So, you should initialise the WeakHashMap with a size of (sizeLimit / 0.75) + 1. The reason for this is that the performance of hashtables degrades quite drastically as they become full. When less than two thirds full, looking something up is done in "constant time", super cheap, just like accessing something in an array. When completely full, this performance goes down to something like looking for a value in a list. Also, when the structure becomes full it is enlarged and this is an expensive process. The reasons behind all this are pretty interesting -- we'll go through them in CI284 this coming year.

AlmasB commented 11 years ago

I did see 0.75 in the WeakHashMap src, had no idea what it meant, thanks for explaining, looking forward to CI284, it's data structures isn't it?

By the way, if I was to make Cache extend WeakHashMap, would Cache also be garbage collectable?

jimburton commented 11 years ago

Yes, CI284 is data structures and algorithms, right up your street :-) Re your second question, I think the Cache itself wouldn't necessarily be GC'able, just its contents.

[Edited this because I think I misunderstood the question at first]

On 19/09/13 15:00, Almas wrote:

I did see 0.75 in the WeakHashMap src, had no idea what it meant, thanks for explaining, looking forward to CI284, it's data structures isn't it?

By the way, if I was to make Cache extend WeakHashMap, would Cache also be garbage collectable?

— Reply to this email directly or view it on GitHub https://github.com/jimburton/BigArrow/issues/16#issuecomment-24740585.


This email has been scanned by MessageLabs' Email Security System on behalf of the University of Brighton. For more information see http://www.brighton.ac.uk/is/spam/


Dr Jim Burton Senior Lecturer in Computing School of Computing, Engineering and Mathematics University of Brighton


This email has been scanned by MessageLabs' Email Security System on behalf of the University of Brighton. For more information see http://www.brighton.ac.uk/is/spam/


AlmasB commented 11 years ago

What should we use as a cache key for place details? the only thing that identifies the place details is its reference I think, we could use that

jimburton commented 11 years ago

Yes, use the reference. (Incidentally, they are very long -- if we knew how we could try to tidy refs up by removing redundant info before using them as keys, but AFAIK we've no way of knowing what, if anything, is redundant for our purposes so we'll go ahead and use them as they are.)