tangrams / tangram-es

2D and 3D map renderer using OpenGL ES
MIT License
823 stars 239 forks source link

Excessive Storage Utilisation - Tiles are cached at duplicate locations, never get pruned #2298

Closed westnordost closed 2 years ago

westnordost commented 2 years ago

Spawned from https://github.com/streetcomplete/StreetComplete/issues/3417 .

In a nutshell

It looks like the map tiles are cached both at the location the user of this library specified and at another location at the same time. The issues are:

My cache configuration

// = sdcard/Android/data/de.westnordost.streetcomplete/cache/tile_cache
val cacheDir = File(context.externalCacheDir, "tile_cache")

val cache = Cache(cacheDir , 50 * 1000L * 1000L) // prune after reaching 50 MB

val cacheControl = CacheControl.Builder()
    .maxAge(12, TimeUnit.HOURS) // do not re-download tiles younger than 12 hours
    .maxStale(14, TimeUnit.DAYS) // prune tiles older than 14 days
    .build()

val httpHandler = object : DefaultHttpHandler(OkHttpClient.Builder().cache(cache)) {
    override fun configureRequest(url: HttpUrl, builder: Request.Builder) {
        builder.cacheControl(cacheControl)
    }
}

tangramMapView.initMap(httpHandler )

The OkHttpCache actually prunes tiles in sdcard/Android/data/de.westnordost.streetcomplete/cache/tile_cache correctly when this directory exceeds a size of 50MB. I tested this.

What is wrong? Observations

  1. Reported cache size: image

  2. Actual size of external cache directory: image

  3. There are no other cache files in neither of

    • /sdcard/Android/data/de.westnordost.streetcomplete/cache/
    • /data/data/de.westnordost.streetcomplete/cache/
  4. When panning the map and thus downloading new tiles, the reported cache size grows at pretty much at 2x the rate as the size of the external cache directory

  5. When comparing the total free storage space on the phone before and after clearing the cache for the app, it is clear that the reported cache size by Android is not a display error: In this case, indeed 57 MB storage space has been freed. However, the size of the directories mentioned in point 3 were reduced by only 22MB.

Thus, I conclude that there must be a third directory somewhere where the tiles are cached and that is never pruned that is outside of the directories mentioned in point 3 and used by tangram-es to store duplicates of the downloaded tiles. Android is able to attribute this directory to the app and able to clear it but I have not found where on the sdcard it is supposed to be.

Used versions

matteblair commented 2 years ago

Interesting report! I did a brief test in the Tangram ES Android demo app to see if I could observe this. The demo app uses a similar caching configuration to yours, but limited to 16MB (https://github.com/tangrams/tangram-es/blob/main/platforms/android/demo/src/main/java/com/mapzen/tangram/android/MainActivity.java#L209-L225). What I observed is that the "Cache" amount in the Application manager corresponds almost exactly to the size of the /sdcard/Android/data/com.mapzen.tangram.android/cache/ folder on my device, as reported by du.

Just to make sure that we're measuring the same things, can you check the size of your tile cache directory by using du on your device via adb shell? The command should be something like:

adb shell du -sh /sdcard/Android/data/de.westnordost.streetcomplete/cache/

Check the output of that and tell me whether it still differs from the "Cache" amount reported by Android.

westnordost commented 2 years ago

Okay, this is curious.

adb shell du -sh /sdcard/Android/data/de.westnordost.streetcomplete/cache/ returns 56 MB adb shell ls /sdcard/Android/data/de.westnordost.streetcomplete/cache| wc -l returns 5213

Copying over all files in that directory yields a directory whose size is 23 MB. The number of files contained is 5213. Copying back that directory onto the phone yields a directory whose size is again 56 MB.

So, it looks like this is just the size on disk and the OkHttpCache counts instead the logical size. So, this explains the 2x rate of growing of the cache.

This also explains why another user who set his cache size limit to 250MB reported in https://github.com/streetcomplete/StreetComplete/issues/3417 that the cache size was reported to be about 500MB.


The output of adb shell mount confused me, either the data is mounted as ext4 or sdcardfs. ext4 has a block size of 1KB, sdcardfs I don't know.

Only one third of the cache files are below 2KB, so I find it really surprising that the size on disk seems to be +100% the size. On the NTFS file system, it is +50% size.

Anyway, I think this issue can then be closed, all has been answered. Thank you for your time!

westnordost commented 2 years ago

In any case, this is a finding that may be useful for providers of map tiles: Maybe rather have fewer large(r) map tiles than many very small ones.

mnalis commented 2 years ago

For anybody interesting in comparing results themselves, first command gives how much real space is used (number of blocks * block size), and second command gives how much is the sum of filesizes. This example is on my android 10 with 512 bytes sector size.


% adb shell '(find /sdcard/Android/data/de.westnordost.streetcomplete/cache  -execdir stat -c "%b*%B+\\" {} \; ; echo 0) | bc'
6671872

% adb shell '(find /sdcard/Android/data/de.westnordost.streetcomplete/cache -type f -execdir stat -c "%s+\\" {} \; ; echo 0) | bc'
4305549

Also, stat -f /mountpoint should reveal what is block size on that filesystem

smichel17 commented 2 years ago

FYI @joxit @ianthetechie (anyone else I missed who might appreciate a heads-up?)

matteblair commented 2 years ago

Good information to know! This makes me think that it would be possible to implement a more space-efficient cache by combining tile responses into a single file - like a sqlite database. I don't expect to do this in Tangram ES but it might be possible for a client application.

Joxit commented 2 years ago

Hi there, yes file system is not adapted to tile storing, you should use SQLite based on the MBTiles specification

matteblair commented 2 years ago

@Joxit indeed MBTiles is what I had in mind when I mentioned an sqlite database. It's a tempting idea, but I think it would be pretty complicated for a couple reasons:

  1. In addition to storing tile data, we also need to respect cache control headers in the client request and in the server response. Currently this is handled by OkHTTP. Implementing this again would be non-trivial work and would probably not be as good as OkHTTP's implementation!
  2. Tangram ES doesn't just request tiles, it can also request scene files, scene bundles, or image resources over HTTP. These other resources don't fit in the MBTiles schema, so we would need multiple caches for different types of resources.

This doesn't mean it can't be done! Tangram ES allows a client application to replace the entire HTTP implementation, so you could certainly create a tile cache based on MBTiles if you wanted to.

westnordost commented 2 years ago

Hmm...

  1. To create a tile cache based on MBTiles just so to store the cache more efficiently seems something no user of a map render library would do. So it seems this is something that should be done by the render library.
  2. However, for web-maps (tangram, maplibre-gl-js), the render library obviously does not manage the cache by itself but the browser does that. So why should the library on native manage the cache itself and on the web, not?

I am not sure if browsers put all the cached files in some kind of giant indexed table so they don't have this problem. I'd also be curious how MapLibre handles this. After all, MapLibre is pretty much the reference implementation that all the map tile providers cater for.

smichel17 commented 2 years ago

Seems like the ideal setup would be an independent library which handles this. That way it could be shared between renderers, and only included when needed.

tallytalwar commented 2 years ago

Last I remember we did add some code to do offline Rendering via mbtiles. If needed we should be able to use that tile source class generically for all tile source cache?

matteblair commented 2 years ago

Yep, there is code in Tangram ES currently for caching vector tile sources in a local mbtiles database. However in its current form it isn't a substitute for HTTP caching. The existing mbtiles cache will naively store all tiles forever, regardless of the cache-control headers that accompanied the tile response.

I looked into maplibre to see what approach is used there. It does seem to use a local sqlite database for tile caching, complete with parsing and logic for response headers. The maplibre caching code is in the shared native library, for reuse across platforms.

A similar approach would make sense for Tangram ES. Tile caching could be implemented in the shared core library using sqlite, with options like the maximum cache size exposed to the client SDKs. I would support this approach, I think it could lead to a better user experience for Tangram ES. But the scope of this work is beyond my current availability (for the time being, at least).

It would be nice if the caching code from maplibre could be shared in an independent library, but unfortunately it seems very tightly coupled with the internals of their renderer.