Indexing of large tile tables

jyutzler commented 6 years ago

This ticket is in response to an email exchange, excerpted below.

I've been working on a large raster geopackage, roughly 550GB worth of tiles covering the entire planet down to zoom level 13. At zoom level 13 there are over 130 million tiles, and simply locating the tiles to render takes time (up to a couple of seconds in my tests)

do you have a UNIQUE (zoom_level, tile_column, tile_row) constraint in your raster table as suggested by http://www.geopackage.org/spec/#_sample_tile_pyramid_informative? If so that results in the creation of an automatic index on those columns.

I created manually a unique index and performance went through the roof: create unique index s2cloudless_tiles on s2cloudless(zoom_level, tile_column, tile_row); But I don't have the luxury to check how it was before, a 550GB file takes 1 week to download here.

At a minimum, we should try to get some clarity on the observed behavior and if necessary, publish an article for geopackage.org on how to index a tiles table properly. While I don't believe this is the case at this time, it is possible that this report will lead to a new GeoPackage extension.

rouault commented 6 years ago

@aaime

I tried to reproduce your test case by generating a SQLite database that should be representative of a GeoPackage database with many tiles, except that they payload is a small one with the following Python script:

import sqlite3
conn = sqlite3.connect("huge.db")
conn.execute("PRAGMA journal_mode = off")
conn.execute("CREATE TABLE test(id INTEGER PRIMARY KEY, blob, zoom_level INTEGER NOT NULL, tile_row INTEGER NOT NULL, tile_column INTEGER NOT NULL, UNIQUE(zoom_level, tile_row, tile_column))")
cursor = conn.cursor()
id = 0
for level in range(0,14):
    print(level)
    for row in range(0, 2**level):
        for col in range(0, 2**level):
            id += 1
            cursor.execute("INSERT INTO test VALUES (?,'foo',?,?,?)", (id,level,row,col))
conn.commit()
conn.close()

Then I timed echo "select * from test where zoom_level = 13 and tile_row between 1000 and 1002 and tile_column between 1000 and 1002;" | sqlite3 huge.db. It runs in a few tens of milliseconds. The size of the file is 3.7 GB.

If I drop the UNIQUE constraint in the generating script and re-runs this, the request runs in 6.8 seconds. And the database size is 1.9 GB. So it shows that both a index is created due to the UNIQUE constraint and that it is actually used.

So I don't have an explanation for the behaviour you are seeing. I guess you were experimenting with Geoserver. Do you know how it reads sqlite database ? Deos it use a native libsqlite3 underneath or is this a Java implementation that would perhaps lack some features of the native library ?

aaime commented 6 years ago

GeoServer uses libsqlite3, as part of the https://github.com/xerial/sqlite-jdbc java wrapper. I'm wondering about an hypthotesis... how good is sqlite3 at rebalancing the indexes?

I inquired about how it was produced, and I've been told the structure was created by gdal, but then the contents were filled via mapcache.

What I'm wondering about is the filling process... maybe the index was there, but badly balanced or spread out, and rebuilding it from scratch with the full dataset available, produced a better layout for it?

I'm trying to download again the file from a remote server that should have good bandwidth (hoping the download won't be cut or that a disk quota won't kill it) and if I succeed I'll post back with some extra info.

aaime commented 6 years ago

Hmm nope, there is not enough free disk space for that behemoth, I'll have to look for another server...

rouault commented 6 years ago

maybe the index was there, but badly balanced or spread out, and rebuilding it from scratch with the full dataset available, produced a better layout for it?

Ah, then perhaps a VACUUM would fix that. But a VACUUM on such a large database might be much slower than adding a new index.

aaime commented 6 years ago

I've downloaded again the sentinel2 package on a server with high bandwidth and a large drive, composed of 4 2TB disks in RAID 0 (living on the fast lane!). Here are the timings while first hitting a random tile (they get better if you re-run of course, due to caching):

time sqlite3 s2cloudless_4326_v1.0.0.gpkg "select * from s2cloudless where zoom_level = 13 and tile_column between 4000 and 4002 and tile_row between 4000 and 4002;"
37311930|13|4000|4000|����
37311929|13|4000|4001|����
37311928|13|4000|4002|����
37311946|13|4001|4000|����
37311945|13|4001|4001|����
37311944|13|4001|4002|����
37311962|13|4002|4000|����
37311961|13|4002|4001|����
37311960|13|4002|4002|����

real    0m1.043s
user    0m0.014s
sys 0m0.005s

Slightly over one second to get just 9 tiles. I've then forced re-creation of the index and repeated the request, along with similar ones (I poked a bit around to avoid caching effects, times remain stable):

 time sqlite3 s2cloudless_4326_v1.0.0.gpkg "select * from s2cloudless where zoom_level = 13 and tile_column between 6000 and 6002 and tile_row between 6000 and 6002;"
54069434|13|6000|6000|����
54069433|13|6000|6001|����
54069432|13|6000|6002|����
54069450|13|6001|6000|����
54069449|13|6001|6001|����
54069448|13|6001|6002|����
54069466|13|6002|6000|����
54069465|13|6002|6001|����
54069464|13|6002|6002|����

real    0m0.055s
user    0m0.008s
sys 0m0.000s

That's roughly 20 times faster, and matches the speedup I originally saw on my machine. I also tried running VACUUM after that, but did not see further speedups.

Maybe an old relational db suggestion applies here: first insert the data, and create the indexes only after that. It helps speed up the inserts, and also helps keep the index nice and compact. For something like GeoPackage, meant to be built once and then used a lot often in read only mode, it would seem like a good suggestion to add in the docs.

rouault commented 6 years ago

I also tried running VACUUM after that, but did not see further speedups.

And what if you VACUUM the original database (without the explicit index you added) ?

Maybe an old relational db suggestion applies here: first insert the data, and create the indexes only after that. It helps speed up the inserts, and also helps keep the index nice and compact.

Ideally, to avoid duplicate indices, we wouldn't want to build an explicit index since the UNIQUE(zoom_level, tile_row, tile_column) constraint (mandated by http://www.geopackage.org/spec/#r54) does that underneath. Unfortunately this is not possible to add this unique constraint after filling the table, because SQLite doesn't support ALTER TABLE ... ADD CONSTRAINT ... UNIQUE (...). The solution would be to relax the DDL of tiles table to not incorporate the UNIQUE constraint in it, and then require the creation of an explicit index on those columns.

jyutzler commented 6 years ago

I think there would be resistance to changing the rules for establishing the tiles table because of backwards compatibility concerns. Since this situation doesn't become a problem until there are an extreme number of tiles, could we consider an extension to support the index?

rouault commented 6 years ago

could we consider an extension to support the index?

Does that need to be an extension (in the sense of being declared in gpkg_extensions) ? I don't foresee any compatibility problems where an application must be aware of the presence of the index. If you rename or delete the tile table, SQLite automatically updates or deletes the index

jyutzler commented 6 years ago

@rouault We have the notion of read-write extensions and write-only extensions. Currently the RTree index is the only write-only one that I know of. Given that this is just one row in a metadata table, I hope this is not that big of a deal.

rouault commented 6 years ago

When people add indices or triggers on columns of features table, we don't require an extension to be put in gpkg_extensions, do we ? For me, this is a similar situation. This is just normal database administration IMHO. This could be a Recommendation / Suggestion / Note

jyutzler commented 6 years ago

Perhaps. If someone can make a blog post, we'll go from there. Maybe it could be incorporated into http://www.geopackage.org/guidance.html.

gramnation commented 6 years ago

SQLite.org says that unique constraint and unique index are logically equivalent. It seems like the informative schema could be updated with a CREATE UNIQUE INDEX statement that is logically equivalent but much easier to interpret and apply in a program.

rouault commented 6 years ago

could be updated with a CREATE UNIQUE INDEX statement that is logically equivalent but much easier to interpret and apply in a program.

Oh, I overlooked this ! This would definitely be a better solution.

jyutzler commented 6 years ago

@tabinfl is going to take the lead on this issue.

jyutzler commented 5 years ago

Closed by #481

opengeospatial / geopackage

Indexing of large tile tables #464