t-rex-tileserver / t-rex

t-rex is a vector tile server specialized on publishing MVT tiles from your own data
https://t-rex.tileserver.ch/
MIT License
556 stars 69 forks source link

Panicked psql connection - joinError #243

Closed chris-aeviator closed 3 years ago

chris-aeviator commented 3 years ago

I'm using t-rex successfully with a gpkg'ed source. When moving the gpkd'ed data to a postgresql database via ogr2ogr and re-running the same t-rex config with a dbConn configured for postgresql, I will get an error when running t-rex generate


t-rex    | ' panicked at 'called `Result::unwrap()` on an `Err` value: JoinError::Panic(..[0/1201]
ex-service/src/mvt_service.rs:355:22
Level 13: 767 / 11178 [===>--------------------------------------------------]  thread 'tokio-runt
ime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: Error(None)', t-rex-core/src
/datasource/postgis_ds.rs:134:20
t-rex    | thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value:
 Error(None)', t-rex-core/src/datasource/postgis_ds.rs:134:20
t-rex    | thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value:
 JoinError::Panic(...)', t-rex-service/src/mvt_service.rs:355:22
Level 13: 768 / 11178 [===>--------------------------------------------------]  thread 'tokio-runt
ime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: JoinError::Panic(...)', t-re
x-service/src/mvt_service.rs:355:22
Level 13: 769 / 11178 [===>--------------------------------------------------]  thread 'tokio-runt
ime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: Error(None)', t-rex-core/src
/datasource/postgis_ds.rs:134:20
t-rex    | thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value:
 JoinError::Panic(...)', t-rex-service/src/mvt_service.rs:355:22
Level 13: 770 / 11178 [===>--------------------------------------------------]  thread 'tokio-runt
ime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: Error(None)', t-rex-core/src
/datasource/postgis_ds.rs:134:20
t-rex    | thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value:
 JoinError::Panic(...)', t-rex-service/src/mvt_service.```
chris-aeviator commented 3 years ago

I can see also a "checkboard" style issue on multiple zoom levels

grafik

grafik

pka commented 3 years ago

This error happens in futures_util::future::join_all(tasks).await; which seems to be a conflict between the rust-postgres Tokio runtime and the runtime used for seeding. This might be also related to #240, but have to investigate this deeper.

chris-aeviator commented 3 years ago

this might be due to invalid geometries. I'm ignoring these since I have millions of hexagons and my rendering pipeline just discards invalid geometries - maybe this helps to boil down the issue

chris-aeviator commented 3 years ago

Some more info - I can see that on some files I don't have the Result.unwrap() JoinError but on some I have (the file I've emailed to you was a merge of all of these files).

The files that don't throw an error are running on only 8 cores, whereas the files that don't throw errors are using the fully avail 24 cores

image

joto commented 3 years ago

I am seeing the same error messages. Looks to me like this has something to do with timeouts happening on long queries. When I add a connection timeout to the pool setup the error messages go away:

let pool = r2d2::Pool::builder()
             .max_size(pool_size as u32)
             .connection_timeout(std::time::Duration::new(1000000, 0))
chris-aeviator commented 3 years ago

Will adding the timeout result in the tiles being correctly generated? I've had the issue with all files that had those errors, they showed inconsistencies on different zoom levels (see my screenshots above)

joto commented 3 years ago

My "fix" is definitely not something you should use in production, I have not verified that it actually fixes the problem (and you don't want such a large timeout anyway). This is simply to help finding the underlying issues here.

chris-aeviator commented 3 years ago

Short update:

I realized when reprojecting the source file (the one that does not throw an error) from ESPG:4326 to ESPG:3857, t-rex will throw that error.

I'm doing the conversion with qgis (which creates the files in the first place).

It seems I'm in the need to do the reprojection since my frontend (deck.gl) seem to dislike the ´properties.geometry.coordinates´ value in 3857, though all the features show fine from the t-rex tiles in 4326.

If my info here is unrelated please just ignore it :+1:

chris-aeviator commented 3 years ago

@joto can I do something to assist you? Unfortunately this issue is causing a halt in my project and I‘d like to work on help solving it.

pka commented 3 years ago

To solve this error, I've added a connection_timeout configuration option, which defaults to 30s. I'm investigating whether a mismatch between connection pool size and the number of parallel tasks is causing this timeout with slow tiles.

Thanks to @joto for the useful hint!