Closed lorenzznerol closed 3 years ago
The PostgreSQL database is setup. This is how to do it:
CREATE TABLE trajectories_nostokes_subset_10000
(obs int,
traj real,
MPA real,
distance real,
land real,
lat real,
lon real,
temp real,
time timestamp,
z real
);
and
CREATE TABLE trajectories_stokes_subset_10000
(obs int,
traj real,
MPA real,
distance real,
land real,
lat real,
lon real,
temp real,
time timestamp,
z real
);
trajectories_nostokes_subset_10000.csv
and trajectories_stokes_subset_10000.csv
in a folder of your choice.For the "nostokes" dataset:
It should take around 80 seconds.
For the "stokes" dataset the same, just with the other path:
It should take around 60 seconds.
SELECT obs, traj, mpa, distance, land, lat, lon, temp, "time", z
FROM public.trajectories_nostokes_subset_10000 where traj = 0 order by obs limit 100;
The "order by" is needed, the observations are not ordered in the source.
Running:
SELECT obs, traj, mpa, distance, land, lat, lon, temp, "time", z
FROM public.trajectories_stokes_subset_10000 where traj = 0 order by obs limit 100;
We see that the trajectories are much faster, the distance is almost double of the "nostokes" dataset.
obs 99, no Stokes drift: 99 0 1 15.530908 0 43.29053 5.2286572 13.36425 "2017-03-05 03:00:00" 1.0182366 obs 99, with Stokes drift: 99 0 1 27.268465 0.99102813 43.353794 5.2833734 0.120739646 "2017-03-05 03:00:00" 1.0182366 With Stokes drift, obs 54 has already reached the "nostokes" obs 99 distance: 54 0 1 15.538239 0.40967715 43.26185 5.3504033 7.9671803 "2017-03-03 06:00:00" 1.0182366
If it is taken away by the Stokes drift, the larvae trajectory also seems to take a slightly different way, since at the same distance, lon
and lat
differ quite a bit, and the land value is different.
It is unclear how interesting this comparison is, but it might lead us to add a possible connection between the two datasets not on the location and time, but on the distance.
lat
, lon
, time
. My suggestion is to kmeans cluster the dataset just on lat
, lon
, time
, and then to use the exact simulation or real life location and time to find the nearest location/time cluster. And then we show the additional data for the dataset filtered on that cluster only. The analysis could cover the speed of the larvae (calculated from distance
and time
data), the mpa
, and the temperature
. In addition, trajectories could be clustered separately according to their speed and mpa
, so that an analysis of different clusters of trajectories can be presented for the zoomed area. It could also mean that we do not need to build any relations in the backend. Since we sometimes might want to join on the distance, and not on location or time. Even a join on traj
is not a good idea, since we also might want to join on other columns independently.
Wrapping up:
My expectation is: we need both datasets as features or take them as labels in the later ML / DL model (either, by somehow calculating one single value from the two, or better, by taking them as a double label), since both indicate some possibility of truth.
To do:
From a development guide:
Think big (scalability)
Keep in mind that - for now - your tool might process just little amounts of data (e.g. 1 GB), but in the long run, the amount of data might become much more (e.g. 10 TB). Hence, when reading data consider what to do if the data is too big for the memory (e.g. read files not at once but in chunks of 10 MB).
This recommended coding practice shows that we will probably have to leave a Python-only analysis and use a lot of the SQL database power (or big data tools later in #17). Complex clustering algorithm like DTW, which takes a lot of time and which we can only do with Python, will need an aggregated level, that is, aggregated days/weeks/months, kmeans clusters, perhaps real value types rounded to 1 or 2 decimals, mean values, quantiles and so on.
The prototype of the clustering can still be done in Python, see #10.
Just documenting some postgreSQL / Python guides that were helpful up to now:
Folium connected to backend, see updated notebook #28
As was reported in #38, a time-scale db (postgresql for time-series) seems to be a good choice. That is why this issue was reopened.
Add plpython3 Extension to Postgres/timescaledb Alpine Docker Image drew my attention on timescale.
Taking the Code from the Dockerfile above:
docker-compose.yml:
# Use postgres/example user/password credentials
version: '3.1'
services:
db:
container_name: postgres1
build: .
restart: always
environment:
POSTGRES_PASSWORD: postgres
adminer:
image: adminer
container_name: postgres_adminer1
restart: always
ports:
- 8080:8080
Dockerfile:
FROM timescale/timescaledb:0.9.0-pg10
RUN set -ex \
&& apk add --no-cache --virtual .plpython3-deps --repository http://nl.alpinelinux.org/alpine/edge/testing \
postgresql-plpython3 \
&& ln -s /usr/lib/postgresql/plpython3.so /usr/local/lib/postgresql/plpython3.so \
&& ln -s /usr/share/postgresql/extension/plpython3u.control /usr/local/share/postgresql/extension/plpython3u.control \
&& ln -s /usr/share/postgresql/extension/plpython3u--1.0.sql /usr/local/share/postgresql/extension/plpython3u--1.0.sql \
&& ln -s /usr/share/postgresql/extension/plpython3u--unpackaged--1.0.sql /usr/local/share/postgresql/extension/plpython3u--unpackaged--1.0.sql
start by going into the folder containing the two files, then run
docker-compose up
Which gives:
Create the language "plpython3":
WORKING! I could start the plpython script!!
You can start the created containers again, no need to rebuild the docker-compose. This can be done by starting the two needed containers again:
docker container start postgres1
docker container start postgres_adminer1
When the postgres1 container is running (either by letting docker-compose up just running, or by starting postgres1), you can now also connect to postgres1 container using psql. This is needed when importing csv with headers - the headers parameter is not supported by adminer, but we need it.
docker exec -it postgres1 psql -U postgres -W postgres db
Somehow the db as the database name that I gave it was ignored, but no harm, it works (REASON, later added: you do not need any value for the -W
parameter, the first non-option value will automatically be the dbname, thus you need only docker exec -it postgres1 psql -U postgres -W db
here):
psql: warning: extra command-line argument "db" ignored
Password for user postgres:
psql (10.3)
Type "help" for help.
After this, you can run as a first check:
postgres=# \du
List of roles
Role name | Attributes | Member of
-----------+------------------------------------------------------------+-----------
postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
Here is a small share of what happens in the log of the two containers when opened with docker-compose up
and I manually create an arbitrary table test_adminer_1
from the adminer menu:
adminer_1 | [Wed Aug 25 10:51:53 2021] [::ffff:172.25.0.1]:35910 [302]: POST /?pgsql=db&username=postgres&db=postgres&ns=public&create=
adminer_1 | [Wed Aug 25 10:51:53 2021] [::ffff:172.25.0.1]:35910 Closing
adminer_1 | [Wed Aug 25 10:51:53 2021] [::ffff:172.25.0.1]:35916 Accepted
adminer_1 | [Wed Aug 25 10:51:53 2021] [::ffff:172.25.0.1]:35916 [200]: GET /?pgsql=db&username=postgres&db=postgres&ns=public&table=test_adminer_1
adminer_1 | [Wed Aug 25 10:51:53 2021] [::ffff:172.25.0.1]:35916 Closing
adminer_1 | [Wed Aug 25 10:51:53 2021] [::ffff:172.25.0.1]:35922 Accepted
...
adminer_1 | [Wed Aug 25 10:53:12 2021] [::ffff:172.25.0.1]:35970 Closed without sending a request; it was probably just an unused speculative preconnection
adminer_1 | [Wed Aug 25 10:53:12 2021] [::ffff:172.25.0.1]:35970 Closing
adminer_1 | [Wed Aug 25 10:53:13 2021] [::ffff:172.25.0.1]:35974 Accepted
adminer_1 | [Wed Aug 25 10:53:18 2021] [::ffff:172.25.0.1]:35974 Closed without sending a request; it was probably just an unused speculative preconnection
adminer_1 | [Wed Aug 25 10:53:18 2021] [::ffff:172.25.0.1]:35974 Closing
postgres1 | 2021-08-25 10:54:36.697 UTC [63] STATEMENT: SELECT * FROM test_adminer_1
postgres1 | SELECT * FROM pg_catalog.pg_tables;
You can see this table test_adminer_1
that was made in the adminer browser window on Windows (localhost:8080) directly in the Linux psql, proving that they are all conntected to the same db:
postgres=# \dt
List of relations
Schema | Name | Type | Owner
--------+----------------+-------+----------
public | test_adminer_1 | table | postgres
(1 row)
Small TODO (DONE):
Up to now, I have not found out how to connect from Windows psql to the docker postgres1 db. I know I should use the IP of the WSL that I am using, but that did not work. Instead, I probably need the IP of the container itself, see: https://stackoverflow.com/a/54058629/11154841. Or I need to check whether the ports are published, see https://stackoverflow.com/questions/56993263/connect-to-dockerized-postgres-from-windows-docker-host#56993487.
Getting the connection from Windows to Linux container is just a sidestep. It is perfectly fine to use psql on Linux instead. I just try this since Adminer is working as well, thus, it must be possible, and such a solution shows how this would work in a local network, if we had one.
UPDATE: this is DONE, I could connect from Windows psql to the Docker container. The trick was to give the host another port than the standard 5432
so that there is no port clash with the local standard PostgreSQL installation that already uses that 5432
port.
Windows psql:
Server [localhost]:
Database [postgres]:
Port [5432]: 5437
Username [postgres]:
Passwort für Benutzer postgres:
psql (13.3, Server 10.3)
Warnung: Konsolencodeseite (850) unterscheidet sich von der Windows-
Codeseite (1252). 8-Bit-Zeichen funktionieren möglicherweise nicht
richtig. Einzelheiten finden Sie auf der psql-Handbuchseite unter
»Notes for Windows users«.
Geben Sie »help« für Hilfe ein.
postgres=# \l
Liste der Datenbanken
Name | Eigent³mer | Kodierung | Sortierfolge | Zeichentyp | Zugriffsprivilegien
-----------+------------+-----------+--------------+------------+-----------------------
db | postgres | UTF8 | en_US.utf8 | en_US.utf8 |
postgres | postgres | UTF8 | en_US.utf8 | en_US.utf8 |
template0 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres +
| | | | | postgres=CTc/postgres
(4 Zeilen)
Thus it shows the db database of the container.
Create a test table in the postgres database:
CREATE TABLE test ();
Then checking this on the Docker terminal:
docker exec -it postgres1 psql -U postgres -W -d db
Password for user postgres:
psql (10.3)
Type "help" for help.
db=# \c postgres
Password for user postgres:
You are now connected to database "postgres" as user "postgres".
postgres=# \dt
List of relations
Schema | Name | Type | Owner
--------+----------------+-------+----------
public | test | table | postgres
Thus, the test table created in the Windows psql shell is also available in the docker container.
For postgres, we now have the tools:
Dumping a database with all of its functions and content in the home folder of Linux:
docker exec -i postgres1 pg_dump -U postgres geomar > geomar_20210829.sql
From: https://www.postgresql.org/docs/13/app-pgrestore.html https://www.postgresql.org/docs/13/app-pgdump.html https://www.postgresql.org/docs/13/app-psql.html
In order to load the backup file with the container, it needs to be mounted into a volume on the container, else we get https://stackoverflow.com/questions/45164267/pg-restore-archiver-could-not-open-input-file-no-such-file-or-director. This is because of security reasons: if you do not mount a volume with a given directory at the start of the container, it cannot be found later, since the container is not allowed to use more than what is inside of it. We therefore add the backup file to the folder of the docker-compose-file which is the "." folder. Then we add volumes: - ./:/home
to the docker-compose file:
version: '3.1'
services:
db:
container_name: postgres1
build: .
restart: always
ports:
- "5437:5432"
environment:
POSTGRES_DB: db
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
volumes:
- ./:/home
adminer:
image: adminer
restart: always
ports:
- 8080:8080
After this, you can check in the bash whether the volume is mounted:
docker exec -it postgres1 bash
bash-4.4# ls
backup_init.sh etc proc sys
bin home root tmp
dev lib run usr
docker-entrypoint-initdb.d media sbin var
docker-entrypoint.sh mnt srv
bash-4.4# cd home
bash-4.4# ls
Dockerfile docker-compose.yml geomar_20210829.sql
bash-4.4#
We see that the file geomar_20210829.sql can be found. Only now, we can also load the database from the backup into a new empty database:
docker exec -i postgres1 createdb -U postgres geomar3
docker exec -i postgres1 psql -U postgres -d geomar3 < ~/geomar_20210829.sql
Check that the data has been loaded:
geomar3=# \dt
Liste der Relationen
Schema | Name | Typ | Eigent³mer
--------+------------------------------------+---------+------------
public | nostokes | Tabelle | postgres
public | stokes | Tabelle | postgres
public | trajectories_nostokes_subset_10000 | Tabelle | postgres
public | trajectories_stokes_subset_10000 | Tabelle | postgres
(4 Zeilen)
geomar3=# select * from stokes;
obs | traj | mpa | distance | land | lat | lon | temp | time | z
-----+------+-----+----------+------+---------+---------+---------+---------------------+---------
0 | 0 | 1 | 0 | 0 | 43.2885 | 5.17132 | 13.4218 | 2017-03-01 00:00:00 | 1.01824
0 | 1 | 1 | 0 | 0 | 43.2966 | 4.9829 | 12.9844 | 2017-03-05 00:00:00 | 1.01824
0 | 2 | 1 | 0 | 0 | 43.2947 | 4.96284 | 13.4682 | 2017-03-10 00:00:00 | 1.01824
0 | 3 | 1 | 0 | 0 | 43.2804 | 5.10096 | 13.3176 | 2017-03-14 00:00:00 | 1.01824
0 | 4 | 1 | 0 | 0 | 43.2949 | 5.13431 | 13.7183 | 2017-03-19 00:00:00 | 1.01824
0 | 5 | 1 | 0 | 0 | 43.2747 | 5.27987 | 13.8343 | 2017-03-23 00:00:00 | 1.01824
0 | 6 | 1 | 0 | 0 | 43.3027 | 5.09533 | 14.2349 | 2017-03-28 00:00:00 | 1.01824
0 | 7 | 1 | 0 | 0 | 43.2912 | 5.30152 | 14.5567 | 2017-04-01 00:00:00 | 1.01824
0 | 8 | 1 | 0 | 0 | 43.3001 | 5.20514 | 14.4992 | 2017-04-06 00:00:00 | 1.01824
Next steps are to install Python packages, this is in a new comment below.
Prepare TimescaleDB so that we can use Grafana to visualize geospatial data stored in TimescaleDB, see #12.
Check the Grafana __timeFilter function
Check other tutorials of TimescaleDB whether they might be useful.
https://docs.timescale.com/timescaledb/latest/tutorials/
It is still likely that Grafana is already the most relevant since it is for geospatial analysis of a time-series.
Getting plpython3u extension might be possible by changing the config file. Perhaps, that is also possible for the docker container.
This is how the config file could be changed locally:
https://stackoverflow.com/questions/60746912/postgrsql-create-extension-timescale
Also check
https://github.com/timescale/timescaledb
and
https://github.com/timescale/timescaledb-tune
to tune the settings.
https://docs.timescale.com/timescaledb/latest/getting-started/create-hypertable/
Chunks and Hypertables are TimescaleDB's secret weapon that make storing and querying times-series data blazing fast at peta-byte scale.
TimescaleDB automatically partitions time-series data into chunks (or sub-tables), based on time and space (hash key, device ID, location or some distinct key). You can configure chunk size such that recent chunks fit memory for faster queries.
A hypertable is an abstraction over all chunks which hold time-series data. Hypertables enable you to query and access data from all the chunks as if it's in a single table. This is because commands made to the hypertable automatically propagate changes down to all of the chunks belonging to that hypertable.
Hypertables and chunks enable superior performance for shallow and wide queries common in real-time monitoring, as well as deep and narrow queries, common in time-series analysis.
TODO: add hypertable on top of the trajectories tables.
----------------------------------------
-- Hypertable to store weather metrics
----------------------------------------
-- Step 1: Define regular table
CREATE TABLE IF NOT EXISTS weather_metrics (
time TIMESTAMP WITHOUT TIME ZONE NOT NULL,
timezone_shift int NULL,
city_name text NULL,
temp_c double PRECISION NULL,
feels_like_c double PRECISION NULL,
temp_min_c double PRECISION NULL,
temp_max_c double PRECISION NULL,
pressure_hpa double PRECISION NULL,
humidity_percent double PRECISION NULL,
wind_speed_ms double PRECISION NULL,
wind_deg int NULL,
rain_1h_mm double PRECISION NULL,
rain_3h_mm double PRECISION NULL,
snow_1h_mm double PRECISION NULL,
snow_3h_mm double PRECISION NULL,
clouds_percent int NULL,
weather_type_id int NULL
);
-- Step 2: Turn into hypertable
SELECT create_hypertable('weather_metrics','time');
Applying this to our settings:
tsdb=> SELECT create_hypertable('trajectories_stokes_subset_10000','time');
ERROR: table "trajectories_stokes_subset_10000" is not empty
TIP: You can migrate data by specifying 'migrate_data => true' when calling this function.
Thus, I need to have empty tables first.
CREATE TABLE nostokes
(obs int,
traj real,
MPA real,
distance real,
land real,
lat real,
lon real,
temp real,
time timestamp,
z real
);
CREATE TABLE stokes
(obs int,
traj real,
MPA real,
distance real,
land real,
lat real,
lon real,
temp real,
time timestamp,
z real
);
Then I can create the hypertable for stokes:
SELECT create_hypertable('stokes','time');
Output:
tsdb=> SELECT create_hypertable('stokes','time');
NOTICE: adding not-null constraint to column "time"
DETAIL: Time dimensions cannot have NULL values.
create_hypertable
---------------------
(1,public,stokes,t)
(1 Zeile)
And for nostokes:
SELECT create_hypertable('nostokes','time');
Output:
tsdb=> SELECT create_hypertable('nostokes','time');
NOTICE: adding not-null constraint to column "time"
DETAIL: Time dimensions cannot have NULL values.
create_hypertable
-----------------------
(2,public,nostokes,t)
(1 Zeile)
Now fill the hypertables, each takes about 10 minutes:
\COPY nostokes(obs, traj, MPA, distance, land, lat, lon, temp, time, z) FROM 'C:\Users\MYUSER\Documents\Masterproject GEOMAR\Data\trajectories_nostokes_subset_10000.csv\trajectories_nostokes_subset_10000.csv' DELIMITER ',' CSV HEADER
COPY 9626734
\COPY stokes(obs, traj, MPA, distance, land, lat, lon, temp, time, z) FROM 'C:\Users\MYUSER\Documents\Masterproject GEOMAR\Data\trajectories_stokes_subset_10000.csv\trajectories_stokes_subset_10000.csv' DELIMITER ',' CSV HEADER
COPY 9640202
Example queries on the hypertables of TimescaleDB taken from https://docs.timescale.com/timescaledb/latest/getting-started/query-data/#time-bucket:
Advanced SQL functions for time-series data
Timescale has many custom-built SQL functions to help you perform time-series analysis in fewer lines of code.
Examples of these functions include:
time_bucket() - used for analyzing data over arbitrary time intervals first() - used for finding the earliest value based on a time within an aggregate group last() - used for finding the latest value based on time within an aggregate group time_bucket_gapfill() - used to analyze data over arbitrary time intervals and fill any gaps in the data locf() - used to fill gaps in data by carrying the last observed value forward interpolate() - used fill gaps by linearly interpolating the missing values between known data points
Query:
SELECT MPA, avg(temp)
FROM stokes
WHERE time > '2017-06-17 00:00:00'
GROUP BY MPA;
Output:
mpa | avg
-----+--------------------
0 | 24.83129397791901
1 | 5.2805780800273165
2 | 20.734297430803384
3 | 4.548956152704943
4 | 26.13433986831728
5 | 8.542661807034637
6 | 9.950746731604623
7 | 24.77539022001049
8 | 24.490924899923055
9 | 3.7701473225911712
10 | 2.972708292554311
(11 Zeilen)
SELECT
time_bucket('2 weeks', time) as bucket,
MPA,
avg(temp) as avg_temp,
max(temp) as max_temp,
min(temp) as min_temp
FROM
stokes
WHERE land = 0
GROUP BY bucket, MPA
ORDER BY MPA, bucket DESC;
Output:
bucket | mpa | avg_temp | max_temp | min_temp
---------------------+-----+--------------------+-----------+------------
2017-09-04 00:00:00 | 0 | 24.82649793118004 | 26.950962 | 17.2496
2017-08-21 00:00:00 | 0 | 25.3658009038291 | 27.407627 | 17.27619
2017-08-07 00:00:00 | 0 | 25.31347652477798 | 28.731215 | 18.1381
2017-07-24 00:00:00 | 0 | 25.236428776536 | 28.721586 | 16.514639
2017-07-10 00:00:00 | 0 | 24.706983859973544 | 27.479097 | 16.878466
2017-06-26 00:00:00 | 0 | 24.18940546878281 | 26.900537 | 16.135618
2017-06-12 00:00:00 | 0 | 24.055359087743206 | 26.86533 | 18.35835
2017-05-29 00:00:00 | 0 | 21.23546009986285 | 23.76144 | 16.513632
2017-05-15 00:00:00 | 0 | 19.243815820999608 | 22.427155 | 16.069181
2017-05-01 00:00:00 | 0 | 16.984227219106213 | 20.351355 | 14.3071
2017-04-17 00:00:00 | 0 | 16.34690732351258 | 18.189545 | 14.204702
2017-04-03 00:00:00 | 0 | 16.028830836404616 | 18.077717 | 13.549699
2017-03-20 00:00:00 | 0 | 15.14634034914818 | 16.569317 | 13.163838
2017-03-06 00:00:00 | 0 | 14.822324462913304 | 16.258774 | 12.355981
2017-02-20 00:00:00 | 0 | 14.377901202577533 | 15.625333 | 12.292271
2017-08-21 00:00:00 | 1 | 17.70642288525899 | 19.324305 | 17.11796
2017-08-07 00:00:00 | 1 | 21.456999400082758 | 23.28298 | 18.169024
2017-07-24 00:00:00 | 1 | 20.353629535969205 | 23.632862 | 16.568699
2017-07-10 00:00:00 | 1 | 21.14121550465235 | 24.00419 | 16.379389
2017-06-26 00:00:00 | 1 | 20.337803440199373 | 23.091074 | 15.823037
2017-06-12 00:00:00 | 1 | 21.56045567917004 | 23.919235 | 17.799965
2017-05-29 00:00:00 | 1 | 20.00516291401717 | 21.999432 | 16.284046
2017-05-15 00:00:00 | 1 | 17.608608576294476 | 20.10287 | 15.896013
2017-05-01 00:00:00 | 1 | 14.876005716960432 | 16.221592 | 14.261587
2017-04-17 00:00:00 | 1 | 14.525842366450622 | 14.856123 | 14.162732
2017-04-03 00:00:00 | 1 | 14.904146327769853 | 15.589169 | 14.249762
2017-03-20 00:00:00 | 1 | 14.202115321830554 | 14.913743 | 13.596922
2017-03-06 00:00:00 | 1 | 13.646966625679811 | 14.209642 | 12.868435
2017-02-20 00:00:00 | 1 | 13.390813040386366 | 13.537243 | 12.958364
2017-08-07 00:00:00 | 2 | 22.122080294291177 | 22.488985 | 21.506363
2017-07-24 00:00:00 | 2 | 22.911628919609626 | 23.207022 | 21.78704
2017-07-10 00:00:00 | 2 | 22.70578627367323 | 23.425163 | 22.047289
2017-06-26 00:00:00 | 2 | 22.282030973107645 | 22.985039 | 21.018116
2017-06-12 00:00:00 | 2 | 21.945727977655867 | 23.218987 | 20.525665
2017-05-29 00:00:00 | 2 | 18.79565565417911 | 20.46295 | 16.969364
2017-05-15 00:00:00 | 2 | 17.18793816288398 | 18.486628 | 16.164696
2017-05-01 00:00:00 | 2 | 15.074951974704522 | 16.244446 | 14.661446
2017-04-17 00:00:00 | 2 | 15.048093477884928 | 15.25091 | 14.678291
2017-04-03 00:00:00 | 2 | 14.416856862129048 | 15.210911 | 13.5999
2017-03-20 00:00:00 | 2 | 13.381607317155407 | 13.846324 | 13.037714
2017-03-06 00:00:00 | 2 | 12.865583325715344 | 13.460182 | 12.035577
2017-02-20 00:00:00 | 2 | 12.175282533545243 | 12.352346 | 12.026701
2017-08-21 00:00:00 | 3 | 22.778899033864338 | 23.903532 | 21.915447
2017-08-07 00:00:00 | 3 | 22.932897410054846 | 24.037523 | 21.745775
2017-07-24 00:00:00 | 3 | 23.13750962516416 | 24.128115 | 21.62837
2017-07-10 00:00:00 | 3 | 22.94398772121698 | 23.896088 | 21.676807
2017-06-26 00:00:00 | 3 | 22.32229376801627 | 23.598776 | 20.707273
2017-06-12 00:00:00 | 3 | 21.789856423988468 | 23.482073 | 20.237402
2017-05-29 00:00:00 | 3 | 19.099566307100975 | 20.573812 | 16.915327
2017-05-15 00:00:00 | 3 | 17.252427572437636 | 19.128645 | 16.0432
2017-05-01 00:00:00 | 3 | 15.353004521591194 | 16.662237 | 14.584954
2017-04-17 00:00:00 | 3 | 14.953890301460444 | 15.370744 | 14.571802
2017-04-03 00:00:00 | 3 | 14.388060151207611 | 15.268351 | 13.439951
2017-03-20 00:00:00 | 3 | 13.648541889431641 | 14.293538 | 13.103977
2017-03-06 00:00:00 | 3 | 13.515276336402412 | 14.08204 | 12.418081
2017-02-20 00:00:00 | 3 | 13.029267660196151 | 13.553823 | 12.263241
2017-07-24 00:00:00 | 4 | 26.59736288622551 | 27.611195 | 25.937021
2017-07-10 00:00:00 | 4 | 26.642848325655585 | 27.153797 | 25.156116
2017-06-26 00:00:00 | 4 | 25.230230484330704 | 26.513733 | 24.294064
2017-06-12 00:00:00 | 4 | 25.614871326573077 | 26.632156 | 23.358932
2017-05-29 00:00:00 | 4 | 22.479892963102493 | 23.451082 | 21.779825
2017-05-15 00:00:00 | 4 | 20.43452684461275 | 21.878067 | 19.106453
2017-05-01 00:00:00 | 4 | 17.44851740750666 | 19.24197 | 16.240253
2017-04-17 00:00:00 | 4 | 16.99341330488736 | 17.477165 | 16.20011
2017-04-03 00:00:00 | 4 | 16.18180535211693 | 17.197947 | 15.207927
2017-03-20 00:00:00 | 4 | 14.909845989286614 | 15.347641 | 14.485728
2017-03-06 00:00:00 | 4 | 14.611053061700513 | 14.870977 | 14.116902
2017-02-20 00:00:00 | 4 | 14.21967608149689 | 14.314707 | 14.102304
2017-09-04 00:00:00 | 5 | 26.286715322925197 | 26.44224 | 26.123873
2017-08-21 00:00:00 | 5 | 26.546544293562572 | 26.708786 | 26.429827
2017-08-07 00:00:00 | 5 | 26.510715435962286 | 27.72983 | 25.024591
2017-07-24 00:00:00 | 5 | 26.891172107215066 | 28.092472 | 25.998983
2017-07-10 00:00:00 | 5 | 26.389373792010606 | 26.916443 | 25.854677
2017-06-26 00:00:00 | 5 | 25.715911850832498 | 26.182154 | 25.15031
2017-06-12 00:00:00 | 5 | 24.734423199309926 | 26.258776 | 23.003166
2017-05-29 00:00:00 | 5 | 22.075411036126518 | 23.20628 | 20.859718
2017-05-15 00:00:00 | 5 | 20.281381089496342 | 21.741549 | 18.588322
2017-05-01 00:00:00 | 5 | 17.591673274224135 | 19.005657 | 16.346645
2017-04-17 00:00:00 | 5 | 16.71526576260958 | 17.320423 | 16.20231
2017-04-03 00:00:00 | 5 | 16.70224673525511 | 17.419226 | 15.458587
2017-03-20 00:00:00 | 5 | 15.524549825954896 | 15.88382 | 15.093782
2017-03-06 00:00:00 | 5 | 15.063820172041945 | 15.358897 | 14.556158
2017-02-20 00:00:00 | 5 | 14.645551872934007 | 14.772932 | 14.463137
2017-08-21 00:00:00 | 6 | 27.03604162656344 | 27.158514 | 26.913284
2017-08-07 00:00:00 | 6 | 25.965313083522922 | 26.40548 | 25.563896
2017-07-24 00:00:00 | 6 | 26.736780636263504 | 28.05643 | 25.128933
2017-07-10 00:00:00 | 6 | 26.18702440973105 | 26.923248 | 25.066355
2017-06-26 00:00:00 | 6 | 25.528028823959996 | 26.464766 | 24.728224
2017-06-12 00:00:00 | 6 | 24.228188776410306 | 26.955032 | 22.982832
2017-05-29 00:00:00 | 6 | 21.807786956762733 | 23.25 | 20.571451
2017-05-15 00:00:00 | 6 | 20.282256336408317 | 21.46365 | 18.99056
2017-05-01 00:00:00 | 6 | 17.227587089876387 | 19.169998 | 16.132702
2017-04-17 00:00:00 | 6 | 16.430665033765848 | 17.027117 | 15.8864565
2017-04-03 00:00:00 | 6 | 16.259949612653237 | 17.338223 | 15.315677
2017-03-20 00:00:00 | 6 | 15.403516700334638 | 15.875272 | 15.202633
2017-03-06 00:00:00 | 6 | 14.947396766693197 | 15.209571 | 14.535177
2017-02-20 00:00:00 | 6 | 14.423605318796836 | 14.514571 | 14.343642
2017-07-24 00:00:00 | 7 | 27.44804191088739 | 28.275007 | 26.379814
2017-07-10 00:00:00 | 7 | 26.14251313096895 | 27.446873 | 24.768017
2017-06-26 00:00:00 | 7 | 24.382099903651646 | 26.0925 | 23.22902
2017-06-12 00:00:00 | 7 | 24.574668076143688 | 26.041327 | 22.445032
2017-05-29 00:00:00 | 7 | 21.70993964935587 | 23.358862 | 20.291487
2017-05-15 00:00:00 | 7 | 19.877360928530074 | 21.258327 | 18.76085
2017-05-01 00:00:00 | 7 | 18.153447741553897 | 19.0056 | 16.881422
2017-04-17 00:00:00 | 7 | 17.18622031496532 | 18.005177 | 16.712332
2017-04-03 00:00:00 | 7 | 16.391833282437364 | 17.588385 | 15.827705
2017-03-20 00:00:00 | 7 | 15.547529755096576 | 16.102781 | 15.016023
2017-03-06 00:00:00 | 7 | 15.14862707963377 | 15.515274 | 14.400544
2017-02-20 00:00:00 | 7 | 14.652459615819595 | 14.780584 | 14.421801
2017-08-07 00:00:00 | 8 | 28.35088086128235 | 28.458033 | 27.65335
2017-07-24 00:00:00 | 8 | 26.982449991719704 | 28.403843 | 25.559795
2017-07-10 00:00:00 | 8 | 25.163873099178247 | 26.364727 | 24.125029
2017-06-26 00:00:00 | 8 | 23.840556455805977 | 25.09964 | 23.055895
2017-06-12 00:00:00 | 8 | 23.52384153660053 | 25.259384 | 21.96889
2017-05-29 00:00:00 | 8 | 21.311403647163846 | 22.247269 | 20.09866
2017-05-15 00:00:00 | 8 | 19.149573199245907 | 20.318583 | 18.356333
2017-05-01 00:00:00 | 8 | 17.86146883415965 | 18.757013 | 16.700336
2017-04-17 00:00:00 | 8 | 17.00029951749074 | 17.72664 | 16.639894
2017-04-03 00:00:00 | 8 | 16.386563993204494 | 17.35961 | 15.892275
2017-03-20 00:00:00 | 8 | 15.48317863950398 | 15.93635 | 15.0099125
2017-03-06 00:00:00 | 8 | 15.334526640124025 | 15.713153 | 14.912541
2017-02-20 00:00:00 | 8 | 14.820478399594625 | 15.025774 | 14.65675
2017-08-07 00:00:00 | 9 | 26.104159379699542 | 27.735556 | 25.249928
2017-07-24 00:00:00 | 9 | 26.635963996507343 | 27.78495 | 25.181967
2017-07-10 00:00:00 | 9 | 24.9525658785456 | 26.29912 | 23.682384
2017-06-26 00:00:00 | 9 | 22.686746013953712 | 24.912384 | 19.114254
2017-06-12 00:00:00 | 9 | 23.307249599679615 | 25.148794 | 21.350853
2017-05-29 00:00:00 | 9 | 20.882533541490766 | 22.015558 | 19.098852
2017-05-15 00:00:00 | 9 | 18.28821899775348 | 19.644644 | 16.654434
2017-05-01 00:00:00 | 9 | 17.333890269954203 | 18.475035 | 15.575313
2017-04-17 00:00:00 | 9 | 17.017341694837476 | 17.717674 | 16.46179
2017-04-03 00:00:00 | 9 | 16.568411441368752 | 17.2555 | 15.872281
2017-03-20 00:00:00 | 9 | 15.524350895146666 | 16.249016 | 14.999238
2017-03-06 00:00:00 | 9 | 15.464346737382858 | 16.186192 | 14.768786
2017-02-20 00:00:00 | 9 | 14.949266238037504 | 15.358558 | 14.625947
2017-09-04 00:00:00 | 10 | 22.872619473934172 | 26.634878 | 16.920845
2017-08-21 00:00:00 | 10 | 25.779601116328276 | 27.363544 | 17.356138
2017-08-07 00:00:00 | 10 | 25.50809705335015 | 28.599775 | 18.97854
2017-07-24 00:00:00 | 10 | 26.043864750089476 | 28.700052 | 16.232222
2017-07-10 00:00:00 | 10 | 24.996329376259673 | 27.566383 | 16.340588
2017-06-26 00:00:00 | 10 | 23.844787870081397 | 26.749493 | 16.112259
2017-06-12 00:00:00 | 10 | 23.867726366557942 | 26.963821 | 18.644419
2017-05-29 00:00:00 | 10 | 21.25132758940279 | 23.808897 | 15.805724
2017-05-15 00:00:00 | 10 | 19.06877684681747 | 22.091215 | 15.99553
2017-05-01 00:00:00 | 10 | 17.37440058009478 | 20.505127 | 14.425107
2017-04-17 00:00:00 | 10 | 16.447088982954845 | 18.280336 | 14.239057
2017-04-03 00:00:00 | 10 | 15.83289019071197 | 18.00387 | 13.495498
2017-03-20 00:00:00 | 10 | 15.390403565608105 | 16.877867 | 13.107029
2017-03-06 00:00:00 | 10 | 14.94302555956803 | 16.011692 | 12.218872
2017-02-20 00:00:00 | 10 | 14.394304068192191 | 15.040423 | 12.17905
(150 Zeilen)
Aggregate on the mpa level:
SELECT time_bucket('30 days', time) as bucket,
MPA, count(MPA) as count
FROM stokes
GROUP BY bucket, MPA
ORDER BY MPA, bucket DESC;
Output:
bucket | mpa | count
---------------------+-----+--------
2017-08-31 00:00:00 | 0 | 27920
2017-08-01 00:00:00 | 0 | 429496
2017-07-02 00:00:00 | 0 | 638021
2017-06-02 00:00:00 | 0 | 626148
2017-05-03 00:00:00 | 0 | 768311
2017-04-03 00:00:00 | 0 | 806947
2017-03-04 00:00:00 | 0 | 397412
2017-02-02 00:00:00 | 0 | 3013
2017-08-31 00:00:00 | 1 | 2003
2017-08-01 00:00:00 | 1 | 13378
2017-07-02 00:00:00 | 1 | 16075
2017-06-02 00:00:00 | 1 | 32255
2017-05-03 00:00:00 | 1 | 49163
2017-04-03 00:00:00 | 1 | 20491
2017-03-04 00:00:00 | 1 | 27451
2017-02-02 00:00:00 | 1 | 969
2017-08-01 00:00:00 | 2 | 65
2017-07-02 00:00:00 | 2 | 2096
2017-06-02 00:00:00 | 2 | 1572
2017-05-03 00:00:00 | 2 | 1706
2017-04-03 00:00:00 | 2 | 953
2017-03-04 00:00:00 | 2 | 731
2017-02-02 00:00:00 | 2 | 69
2017-08-31 00:00:00 | 3 | 1596
2017-08-01 00:00:00 | 3 | 23894
2017-07-02 00:00:00 | 3 | 51036
2017-06-02 00:00:00 | 3 | 68122
2017-05-03 00:00:00 | 3 | 73686
2017-04-03 00:00:00 | 3 | 57416
2017-03-04 00:00:00 | 3 | 45431
2017-02-02 00:00:00 | 3 | 870
2017-08-01 00:00:00 | 4 | 31
2017-07-02 00:00:00 | 4 | 4007
2017-06-02 00:00:00 | 4 | 5931
2017-05-03 00:00:00 | 4 | 4824
2017-04-03 00:00:00 | 4 | 7105
2017-03-04 00:00:00 | 4 | 5406
2017-02-02 00:00:00 | 4 | 336
2017-08-31 00:00:00 | 5 | 761
2017-08-01 00:00:00 | 5 | 18930
2017-07-02 00:00:00 | 5 | 59276
2017-06-02 00:00:00 | 5 | 54138
2017-05-03 00:00:00 | 5 | 74212
2017-04-03 00:00:00 | 5 | 106657
2017-03-04 00:00:00 | 5 | 33696
2017-02-02 00:00:00 | 5 | 970
2017-08-31 00:00:00 | 6 | 200
2017-08-01 00:00:00 | 6 | 7956
2017-07-02 00:00:00 | 6 | 14249
2017-06-02 00:00:00 | 6 | 16333
2017-05-03 00:00:00 | 6 | 19912
2017-04-03 00:00:00 | 6 | 18782
2017-03-04 00:00:00 | 6 | 9596
2017-02-02 00:00:00 | 6 | 697
2017-08-01 00:00:00 | 7 | 157
2017-07-02 00:00:00 | 7 | 1964
2017-06-02 00:00:00 | 7 | 1729
2017-05-03 00:00:00 | 7 | 1703
2017-04-03 00:00:00 | 7 | 2482
2017-03-04 00:00:00 | 7 | 2222
2017-02-02 00:00:00 | 7 | 145
2017-08-01 00:00:00 | 8 | 24
2017-07-02 00:00:00 | 8 | 462
2017-06-02 00:00:00 | 8 | 672
2017-05-03 00:00:00 | 8 | 1168
2017-04-03 00:00:00 | 8 | 780
2017-03-04 00:00:00 | 8 | 838
2017-02-02 00:00:00 | 8 | 15
2017-08-31 00:00:00 | 9 | 572
2017-08-01 00:00:00 | 9 | 69097
2017-07-02 00:00:00 | 9 | 170611
2017-06-02 00:00:00 | 9 | 197504
2017-05-03 00:00:00 | 9 | 131471
2017-04-03 00:00:00 | 9 | 155801
2017-03-04 00:00:00 | 9 | 63782
2017-02-02 00:00:00 | 9 | 1550
2017-08-31 00:00:00 | 10 | 38744
2017-08-01 00:00:00 | 10 | 596312
2017-07-02 00:00:00 | 10 | 932533
2017-06-02 00:00:00 | 10 | 885804
2017-05-03 00:00:00 | 10 | 764152
2017-04-03 00:00:00 | 10 | 679142
2017-03-04 00:00:00 | 10 | 285763
2017-02-02 00:00:00 | 10 | 702
(84 Zeilen)
Or more details, as an example, on MPA level again:
SELECT
time_bucket('2 weeks', time) as bucket,
MPA,
round( CAST(avg(temp) as numeric), 1) as avg_temp,
round( CAST(max(temp) as numeric), 1) as max_temp,
round( CAST(min(temp) as numeric), 1) as min_temp,
round( CAST(avg(lon) as numeric), 1) as avg_lon,
round( CAST(max(lon) as numeric), 1) as max_lon,
round( CAST(min(lon) as numeric), 1) as min_lon,
round( CAST(avg(lat) as numeric), 1) as avg_lat,
round( CAST(max(lat) as numeric), 1) as max_lat,
round( CAST(min(lat) as numeric), 1) as min_lat
FROM
stokes
WHERE land = 0
GROUP BY bucket, MPA
ORDER BY MPA, bucket DESC
LIMIT 10;
Output:
bucket | mpa | avg_temp | max_temp | min_temp | avg_lon | max_lon | min_lon | avg_lat | max_lat | min_lat
---------------------+-----+----------+----------+----------+---------+---------+---------+---------+---------+---------
2017-09-04 00:00:00 | 0 | 24.8 | 27.0 | 17.2 | 1.9 | 8.2 | -4.5 | 38.7 | 43.1 | 35.3
2017-08-21 00:00:00 | 0 | 25.4 | 27.4 | 17.3 | 1.9 | 8.4 | -5.1 | 38.9 | 43.2 | 35.2
2017-08-07 00:00:00 | 0 | 25.3 | 28.7 | 18.1 | 2.1 | 8.1 | -5.2 | 39.0 | 43.4 | 35.2
2017-07-24 00:00:00 | 0 | 25.2 | 28.7 | 16.5 | 2.6 | 8.9 | -4.1 | 39.7 | 43.4 | 35.4
2017-07-10 00:00:00 | 0 | 24.7 | 27.5 | 16.9 | 3.0 | 8.2 | -3.7 | 40.1 | 43.4 | 35.2
2017-06-26 00:00:00 | 0 | 24.2 | 26.9 | 16.1 | 3.1 | 8.3 | -3.8 | 40.0 | 43.3 | 35.2
2017-06-12 00:00:00 | 0 | 24.1 | 26.9 | 18.4 | 3.4 | 7.9 | -4.9 | 40.5 | 43.3 | 35.3
2017-05-29 00:00:00 | 0 | 21.2 | 23.8 | 16.5 | 2.9 | 7.7 | -4.9 | 40.1 | 43.3 | 35.2
2017-05-15 00:00:00 | 0 | 19.2 | 22.4 | 16.1 | 2.5 | 8.0 | -5.2 | 39.7 | 43.4 | 35.2
2017-05-01 00:00:00 | 0 | 17.0 | 20.4 | 14.3 | 2.4 | 8.2 | -5.2 | 39.6 | 43.3 | 35.2
(10 Zeilen)
Aggregate on the traj level:
SELECT *,
round( CAST((x.max_distance - x.min_distance) / count_obs as numeric), 1) as avg_speed,
max_distance - min_distance as diff_distance,
max_lat - min_lat as diff_lat,
max_lon - min_lon as diff_lon,
max_temp - min_temp as diff_temp,
max_z - min_z as diff_z
FROM (
SELECT
time_bucket('1 day', time) as bucket,
traj,
count(obs) as count_obs,
avg(MPA) as avg_mpa,
max(MPA) as max_mpa,
min(MPA) as min_mpa,
round( CAST(avg(distance) as numeric), 1) as avg_distance,
round( CAST(max(distance) as numeric), 1) as max_distance,
round( CAST(min(distance) as numeric), 1) as min_distance,
round( CAST(avg(lat) as numeric), 1) as avg_lat,
round( CAST(max(lat) as numeric), 1) as max_lat,
round( CAST(min(lat) as numeric), 1) as min_lat,
round( CAST(avg(lon) as numeric), 1) as avg_lon,
round( CAST(max(lon) as numeric), 1) as max_lon,
round( CAST(min(lon) as numeric), 1) as min_lon,
round( CAST(avg(temp) as numeric), 1) as avg_temp,
round( CAST(max(temp) as numeric), 1) as max_temp,
round( CAST(min(temp) as numeric), 1) as min_temp,
round( CAST(avg(z) as numeric), 1) as avg_z,
round( CAST(max(z) as numeric), 1) as max_z,
round( CAST(min(z) as numeric), 1) as min_z
FROM
stokes
WHERE land = 0
and traj < 10
GROUP BY bucket, traj
ORDER BY traj, bucket DESC
LIMIT 20) x;
Output:
bucket | traj | count_obs | avg_mpa | max_mpa | min_mpa | avg_distance | max_distance | min_distance | avg_lat | max_lat | min_lat | avg_lon | max_lon | min_lon | avg_temp | max_temp | min_temp | avg_z | max_z | min_z | avg_speed | diff_distance | diff_lat | diff_lon | diff_temp | diff_z
---------------------+------+-----------+---------+---------+---------+--------------+--------------+--------------+---------+---------+---------+---------+---------+---------+----------+----------+----------+-------+-------+-------+-----------+---------------+----------+----------+-----------+--------
2017-03-08 00:00:00 | 0 | 1 | 1 | 1 | 1 | 33.3 | 33.3 | 33.3 | 43.3 | 43.3 | 43.3 | 5.3 | 5.3 | 5.3 | 13.4 | 13.4 | 13.4 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0
2017-03-03 00:00:00 | 0 | 3 | 1 | 1 | 1 | 20.2 | 21.0 | 19.5 | 43.3 | 43.3 | 43.3 | 5.3 | 5.3 | 5.3 | 13.5 | 13.5 | 13.5 | 1.0 | 1.0 | 1.0 | 0.5 | 1.5 | 0.0 | 0.0 | 0.0 | 0.0
2017-03-02 00:00:00 | 0 | 13 | 1 | 1 | 1 | 12.7 | 13.5 | 11.8 | 43.3 | 43.3 | 43.3 | 5.3 | 5.3 | 5.3 | 13.5 | 13.5 | 13.5 | 1.0 | 1.0 | 1.0 | 0.1 | 1.7 | 0.0 | 0.0 | 0.0 | 0.0
2017-03-01 00:00:00 | 0 | 24 | 1 | 1 | 1 | 6.9 | 11.6 | 0.0 | 43.3 | 43.3 | 43.3 | 5.3 | 5.3 | 5.2 | 13.5 | 13.5 | 13.4 | 1.0 | 1.0 | 1.0 | 0.5 | 11.6 | 0.0 | 0.1 | 0.1 | 0.0
2017-04-14 00:00:00 | 1 | 2 | 0 | 0 | 0 | 516.9 | 517.0 | 516.9 | 40.9 | 40.9 | 40.9 | 5.8 | 5.8 | 5.8 | 16.2 | 16.2 | 16.2 | 1.0 | 1.0 | 1.0 | 0.1 | 0.1 | 0.0 | 0.0 | 0.0 | 0.0
2017-04-13 00:00:00 | 1 | 24 | 0 | 0 | 0 | 510.5 | 516.3 | 505.7 | 40.9 | 40.9 | 40.9 | 5.8 | 5.8 | 5.7 | 16.0 | 16.2 | 15.8 | 1.0 | 1.0 | 1.0 | 0.4 | 10.6 | 0.0 | 0.1 | 0.4 | 0.0
2017-04-12 00:00:00 | 1 | 24 | 0 | 0 | 0 | 501.6 | 505.4 | 496.7 | 40.9 | 41.0 | 40.9 | 5.7 | 5.7 | 5.7 | 15.8 | 15.8 | 15.7 | 1.0 | 1.0 | 1.0 | 0.4 | 8.7 | 0.1 | 0.0 | 0.1 | 0.0
2017-04-11 00:00:00 | 1 | 24 | 0 | 0 | 0 | 489.6 | 496.2 | 483.2 | 41.1 | 41.1 | 41.0 | 5.7 | 5.7 | 5.6 | 15.8 | 15.8 | 15.7 | 1.0 | 1.0 | 1.0 | 0.5 | 13.0 | 0.1 | 0.1 | 0.1 | 0.0
2017-04-10 00:00:00 | 1 | 24 | 0 | 0 | 0 | 479.2 | 482.7 | 477.4 | 41.1 | 41.1 | 41.1 | 5.6 | 5.6 | 5.6 | 15.8 | 15.9 | 15.8 | 1.0 | 1.0 | 1.0 | 0.2 | 5.3 | 0.0 | 0.0 | 0.1 | 0.0
2017-04-09 00:00:00 | 1 | 24 | 0 | 0 | 0 | 476.2 | 477.3 | 474.8 | 41.1 | 41.1 | 41.1 | 5.6 | 5.6 | 5.6 | 15.8 | 15.9 | 15.7 | 1.0 | 1.0 | 1.0 | 0.1 | 2.5 | 0.0 | 0.0 | 0.2 | 0.0
2017-04-08 00:00:00 | 1 | 24 | 0 | 0 | 0 | 473.0 | 474.7 | 471.1 | 41.1 | 41.1 | 41.1 | 5.6 | 5.6 | 5.6 | 15.5 | 15.6 | 15.3 | 1.0 | 1.0 | 1.0 | 0.2 | 3.6 | 0.0 | 0.0 | 0.3 | 0.0
2017-04-07 00:00:00 | 1 | 24 | 0 | 0 | 0 | 468.3 | 470.9 | 465.0 | 41.1 | 41.1 | 41.1 | 5.6 | 5.7 | 5.6 | 15.1 | 15.3 | 14.9 | 1.0 | 1.0 | 1.0 | 0.2 | 5.9 | 0.0 | 0.1 | 0.4 | 0.0
2017-04-06 00:00:00 | 1 | 24 | 0 | 0 | 0 | 459.4 | 464.6 | 453.5 | 41.1 | 41.2 | 41.1 | 5.7 | 5.8 | 5.7 | 14.7 | 14.9 | 14.6 | 1.0 | 1.0 | 1.0 | 0.5 | 11.1 | 0.1 | 0.1 | 0.3 | 0.0
2017-04-05 00:00:00 | 1 | 24 | 0 | 0 | 0 | 444.8 | 452.9 | 434.8 | 41.2 | 41.3 | 41.2 | 5.7 | 5.8 | 5.7 | 14.4 | 14.6 | 14.3 | 1.0 | 1.0 | 1.0 | 0.8 | 18.1 | 0.1 | 0.1 | 0.3 | 0.0
2017-04-04 00:00:00 | 1 | 24 | 0 | 0 | 0 | 425.0 | 433.9 | 417.9 | 41.4 | 41.5 | 41.3 | 5.6 | 5.6 | 5.6 | 14.2 | 14.3 | 14.2 | 1.0 | 1.0 | 1.0 | 0.7 | 16.0 | 0.2 | 0.0 | 0.1 | 0.0
2017-04-03 00:00:00 | 1 | 24 | 0 | 0 | 0 | 410.7 | 417.5 | 402.9 | 41.5 | 41.6 | 41.5 | 5.6 | 5.6 | 5.6 | 14.2 | 14.2 | 14.2 | 1.0 | 1.0 | 1.0 | 0.6 | 14.6 | 0.1 | 0.0 | 0.0 | 0.0
2017-04-02 00:00:00 | 1 | 24 | 0 | 0 | 0 | 391.5 | 402.2 | 378.5 | 41.7 | 41.8 | 41.6 | 5.6 | 5.6 | 5.6 | 14.2 | 14.3 | 14.2 | 1.0 | 1.0 | 1.0 | 1.0 | 23.7 | 0.2 | 0.0 | 0.1 | 0.0
2017-04-01 00:00:00 | 1 | 24 | 0 | 0 | 0 | 363.9 | 377.2 | 355.9 | 41.9 | 42.0 | 41.8 | 5.6 | 5.7 | 5.6 | 14.5 | 14.7 | 14.3 | 1.0 | 1.0 | 1.0 | 0.9 | 21.3 | 0.2 | 0.1 | 0.4 | 0.0
2017-03-31 00:00:00 | 1 | 24 | 0 | 0 | 0 | 353.3 | 355.6 | 351.2 | 42.0 | 42.0 | 42.0 | 5.7 | 5.7 | 5.7 | 14.8 | 14.9 | 14.7 | 1.0 | 1.0 | 1.0 | 0.2 | 4.4 | 0.0 | 0.0 | 0.2 | 0.0
2017-03-30 00:00:00 | 1 | 24 | 0 | 0 | 0 | 348.0 | 351.0 | 345.0 | 42.1 | 42.1 | 42.0 | 5.7 | 5.7 | 5.7 | 14.8 | 14.9 | 14.7 | 1.0 | 1.0 | 1.0 | 0.3 | 6.0 | 0.1 | 0.0 | 0.2 | 0.0
(20 Zeilen)
https://docs.timescale.com/timescaledb/latest/getting-started/create-cagg/
Continuous aggregates are automatically refreshed materialized views – so they massively speed up workloads that need to process large amounts of data. Unlike in other databases, your views are automatically refreshed in the background as new data is added, or as old data is modified according to a schedule.
That means:
The previous queries can be stored in pre-calculated tables so that everyone can access the aggregates directly. This is especially needed on traj level when we aggregate on a daily base or every 5 hours, for example.
By this, we can calculate data at the shortest query time possible (since it is just asking for the content of pre-calculated materialized views), and not only that, we can also continuously update without affecting the query time. That might get more important on the long run, not now.
Example:
It is not possible to create a nested view in one go. We need to use window functions or calculate things many times again to get around that, or we need to create views on top of each other.
The original (and working) query was like this:
SELECT *,
round( CAST((x.max_distance - x.min_distance) / count_obs as numeric), 1) as avg_speed,
max_distance - min_distance as diff_distance,
max_lat - min_lat as diff_lat,
max_lon - min_lon as diff_lon,
max_temp - min_temp as diff_temp,
max_z - min_z as diff_z
FROM (
SELECT
time_bucket('1 day', time) as bucket,
traj,
count(obs) as count_obs,
avg(MPA) as avg_mpa,
max(MPA) as max_mpa,
min(MPA) as min_mpa,
round( CAST(avg(distance) as numeric), 1) as avg_distance,
round( CAST(max(distance) as numeric), 1) as max_distance,
round( CAST(min(distance) as numeric), 1) as min_distance,
round( CAST(avg(lat) as numeric), 1) as avg_lat,
round( CAST(max(lat) as numeric), 1) as max_lat,
round( CAST(min(lat) as numeric), 1) as min_lat,
round( CAST(avg(lon) as numeric), 1) as avg_lon,
round( CAST(max(lon) as numeric), 1) as max_lon,
round( CAST(min(lon) as numeric), 1) as min_lon,
round( CAST(avg(temp) as numeric), 1) as avg_temp,
round( CAST(max(temp) as numeric), 1) as max_temp,
round( CAST(min(temp) as numeric), 1) as min_temp,
round( CAST(avg(z) as numeric), 1) as avg_z,
round( CAST(max(z) as numeric), 1) as max_z,
round( CAST(min(z) as numeric), 1) as min_z
FROM
stokes
WHERE land = 0
GROUP BY bucket, traj
) x
But that throws an error:
ERROR: invalid continuous aggregate view TIP: Include at least one aggregate function and a GROUP BY clause with time bucket.
tsdb=> CREATE MATERIALIZED VIEW traj_daily
tsdb-> WITH (timescaledb.continuous)
tsdb-> AS
tsdb-> SELECT *,
tsdb-> round( CAST((x.max_distance - x.min_distance) / count_obs as numeric), 1) as avg_speed,
tsdb-> max_distance - min_distance as diff_distance,
tsdb-> max_lat - min_lat as diff_lat,
tsdb-> max_lon - min_lon as diff_lon,
tsdb-> max_temp - min_temp as diff_temp,
tsdb-> max_z - min_z as diff_z
tsdb->
tsdb-> FROM (
tsdb(>
tsdb(> SELECT
tsdb(> time_bucket('1 day', time) as bucket,
tsdb(> traj,
tsdb(> count(obs) as count_obs,
tsdb(> avg(MPA) as avg_mpa,
tsdb(> max(MPA) as max_mpa,
tsdb(> min(MPA) as min_mpa,
tsdb(> round( CAST(avg(distance) as numeric), 1) as avg_distance,
tsdb(> round( CAST(max(distance) as numeric), 1) as max_distance,
tsdb(> round( CAST(min(distance) as numeric), 1) as min_distance,
tsdb(> round( CAST(avg(lat) as numeric), 1) as avg_lat,
tsdb(> round( CAST(max(lat) as numeric), 1) as max_lat,
tsdb(> round( CAST(min(lat) as numeric), 1) as min_lat,
tsdb(> round( CAST(avg(lon) as numeric), 1) as avg_lon,
tsdb(> round( CAST(max(lon) as numeric), 1) as max_lon,
tsdb(> round( CAST(min(lon) as numeric), 1) as min_lon,
tsdb(> round( CAST(avg(temp) as numeric), 1) as avg_temp,
tsdb(> round( CAST(max(temp) as numeric), 1) as max_temp,
tsdb(> round( CAST(min(temp) as numeric), 1) as min_temp,
tsdb(> round( CAST(avg(z) as numeric), 1) as avg_z,
tsdb(> round( CAST(max(z) as numeric), 1) as max_z,
tsdb(> round( CAST(min(z) as numeric), 1) as min_z
tsdb(> FROM
tsdb(> stokes
tsdb(> WHERE land = 0
tsdb(> GROUP BY bucket, traj
tsdb(> ) x
tsdb-> WITH NO DATA;
ERROR: invalid continuous aggregate view
TIP: Include at least one aggregate function and a GROUP BY clause with time bucket.
That is why we need to unnest the query.
Either by creating two views (so that one view is on top of the other).
View 1:
CREATE MATERIALIZED VIEW traj_daily_prep
WITH (timescaledb.continuous)
AS
SELECT
time_bucket('1 day', time) as bucket,
traj,
count(obs) as count_obs,
avg(MPA) as avg_mpa,
max(MPA) as max_mpa,
min(MPA) as min_mpa,
round( CAST(avg(distance) as numeric), 1) as avg_distance,
round( CAST(max(distance) as numeric), 1) as max_distance,
round( CAST(min(distance) as numeric), 1) as min_distance,
round( CAST(avg(lat) as numeric), 1) as avg_lat,
round( CAST(max(lat) as numeric), 1) as max_lat,
round( CAST(min(lat) as numeric), 1) as min_lat,
round( CAST(avg(lon) as numeric), 1) as avg_lon,
round( CAST(max(lon) as numeric), 1) as max_lon,
round( CAST(min(lon) as numeric), 1) as min_lon,
round( CAST(avg(temp) as numeric), 1) as avg_temp,
round( CAST(max(temp) as numeric), 1) as max_temp,
round( CAST(min(temp) as numeric), 1) as min_temp,
round( CAST(avg(z) as numeric), 1) as avg_z,
round( CAST(max(z) as numeric), 1) as max_z,
round( CAST(min(z) as numeric), 1) as min_z
FROM
stokes
WHERE land = 0
and traj < 10
GROUP BY bucket, traj
WITH NO DATA;
Out:
CREATE MATERIALIZED VIEW
View 2:
We do not need WITH (timescaledb.continuous)
here since that was only needed in the traj_daily_prep view.
CREATE MATERIALIZED VIEW traj_daily_final
AS
SELECT *,
round( CAST((x.max_distance - x.min_distance) / count_obs as numeric), 1) as avg_speed,
max_distance - min_distance as diff_distance,
max_lat - min_lat as diff_lat,
max_lon - min_lon as diff_lon,
max_temp - min_temp as diff_temp,
max_z - min_z as diff_z
FROM traj_daily_prep x
WITH NO DATA;
Out:
CREATE MATERIALIZED VIEW
Or by putting it altogether in one view, like here:
CREATE MATERIALIZED VIEW traj_daily
WITH (timescaledb.continuous)
AS
SELECT
time_bucket('1 day', time) as bucket,
traj,
count(obs) as count_obs,
avg(MPA) as avg_mpa,
max(MPA) as max_mpa,
min(MPA) as min_mpa,
round( CAST(avg(distance) as numeric), 1) as avg_distance,
round( CAST(max(distance) as numeric), 1) as max_distance,
round( CAST(min(distance) as numeric), 1) as min_distance,
round( CAST(max(distance) - min(distance) as numeric), 1) as diff_distance,
round( CAST((max(distance) - min(distance)) / count(obs) as numeric), 1) as avg_speed,
round( CAST(avg(lat) as numeric), 1) as avg_lat,
round( CAST(max(lat) as numeric), 1) as max_lat,
round( CAST(min(lat) as numeric), 1) as min_lat,
round( CAST(max(lat) - min(lat) as numeric), 1) as diff_lat,
round( CAST(avg(lon) as numeric), 1) as avg_lon,
round( CAST(max(lon) as numeric), 1) as max_lon,
round( CAST(min(lon) as numeric), 1) as min_lon,
round( CAST(max(lon) - min(lon) as numeric), 1) as diff_lon,
round( CAST(avg(temp) as numeric), 1) as avg_temp,
round( CAST(max(temp) as numeric), 1) as max_temp,
round( CAST(min(temp) as numeric), 1) as min_temp,
round( CAST(max(temp) - min(temp) as numeric), 1) as diff_temp,
round( CAST(avg(z) as numeric), 1) as avg_z,
round( CAST(max(z) as numeric), 1) as max_z,
round( CAST(min(z) as numeric), 1) as min_z,
round( CAST(max(z) - min(z) as numeric), 1) as diff_z
FROM
stokes
WHERE land = 0
and traj < 10
GROUP BY bucket, traj
WITH NO DATA;
Out:
CREATE MATERIALIZED VIEW
If you want to drop a view, use
drop materialized view traj_daily;
Out:
DROP MATERIALIZED VIEW
CALL refresh_continuous_aggregate('traj_daily','2010-01-01', '2021-01-01');
Out:
CALL
SELECT * from traj_daily
WHERE bucket > '2009-01-01'
ORDER BY bucket ASC;
Out:
bucket | traj | count_obs | avg_mpa | max_mpa | min_mpa | avg_distance | max_distance | min_distance | diff_distance | avg_speed | avg_lat | max_lat | min_lat | diff_lat | avg_lon | max_lon | min_lon | diff_lon | avg_temp | max_temp | min_temp | diff_temp | avg_z | max_z | min_z | diff_z
---------------------+------+-----------+---------------------+---------+---------+--------------+--------------+--------------+---------------+-----------+---------+---------+---------+----------+---------+---------+---------+----------+----------+----------+----------+-----------+-------+-------+-------+--------
2017-03-01 00:00:00 | 0 | 24 | 1 | 1 | 1 | 6.9 | 11.6 | 0.0 | 11.6 | 0.5 | 43.3 | 43.3 | 43.3 | 0.0 | 5.3 | 5.3 | 5.2 | 0.1 | 13.5 | 13.5 | 13.4 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0
2017-03-02 00:00:00 | 0 | 13 | 1 | 1 | 1 | 12.7 | 13.5 | 11.8 | 1.7 | 0.1 | 43.3 | 43.3 | 43.3 | 0.0 | 5.3 | 5.3 | 5.3 | 0.0 | 13.5 | 13.5 | 13.5 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0
2017-03-03 00:00:00 | 0 | 3 | 1 | 1 | 1 | 20.2 | 21.0 | 19.5 | 1.5 | 0.5 | 43.3 | 43.3 | 43.3 | 0.0 | 5.3 | 5.3 | 5.3 | 0.0 | 13.5 | 13.5 | 13.5 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0
2017-03-05 00:00:00 | 1 | 24 | 0.2916666666666667 | 1 | 0 | 12.9 | 22.8 | 0.0 | 22.8 | 0.9 | 43.2 | 43.3 | 43.2 | 0.1 | 5.1 | 5.2 | 5.0 | 0.2 | 13.1 | 13.3 | 13.0 | 0.3 | 1.0 | 1.0 | 1.0 | 0.0
2017-03-06 00:00:00 | 1 | 24 | 2.9166666666666665 | 10 | 0 | 32.4 | 46.0 | 23.5 | 22.5 | 0.9 | 43.1 | 43.1 | 43.1 | 0.1 | 5.3 | 5.4 | 5.2 | 0.2 | 13.6 | 13.7 | 13.4 | 0.4 | 1.0 | 1.0 | 1.0 | 0.0
2017-03-07 00:00:00 | 1 | 24 | 0 | 0 | 0 | 62.7 | 80.1 | 47.4 | 32.7 | 1.4 | 42.9 | 43.1 | 42.8 | 0.3 | 5.5 | 5.6 | 5.5 | 0.1 | 13.7 | 13.8 | 13.5 | 0.3 | 1.0 | 1.0 | 1.0 | 0.0
2017-03-08 00:00:00 | 0 | 1 | 1 | 1 | 1 | 33.3 | 33.3 | 33.3 | 0.0 | 0.0 | 43.3 | 43.3 | 43.3 | 0.0 | 5.3 | 5.3 | 5.3 | 0.0 | 13.4 | 13.4 | 13.4 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0
2017-03-08 00:00:00 | 1 | 24 | 0 | 0 | 0 | 94.3 | 104.7 | 81.7 | 23.0 | 1.0 | 42.7 | 42.8 | 42.6 | 0.2 | 5.5 | 5.5 | 5.5 | 0.0 | 13.3 | 13.5 | 13.3 | 0.2 | 1.0 | 1.0 | 1.0 | 0.0
2017-03-09 00:00:00 | 1 | 24 | 0 | 0 | 0 | 117.0 | 129.0 | 105.6 | 23.4 | 1.0 | 42.5 | 42.6 | 42.4 | 0.2 | 5.5 | 5.6 | 5.5 | 0.1 | 13.3 | 13.3 | 13.3 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0
2017-03-10 00:00:00 | 1 | 24 | 0 | 0 | 0 | 141.9 | 148.5 | 130.3 | 18.2 | 0.8 | 42.3 | 42.4 | 42.2 | 0.2 | 5.6 | 5.6 | 5.6 | 0.0 | 13.3 | 13.3 | 13.3 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0
2017-03-10 00:00:00 | 2 | 24 | 0.16666666666666666 | 1 | 0 | 13.0 | 21.1 | 0.0 | 21.1 | 0.9 | 43.2 | 43.3 | 43.1 | 0.2 | 5.0 | 5.0 | 5.0 | 0.0 | 13.5 | 13.5 | 13.4 | 0.1 | 1.0 | 1.0 | 1.0 | 0.0
2017-03-11 00:00:00 | 1 | 24 | 0 | 0 | 0 | 150.6 | 152.7 | 148.7 | 4.0 | 0.2 | 42.2 | 42.2 | 42.2 | 0.0 | 5.6 | 5.6 | 5.6 | 0.0 | 13.4 | 13.4 | 13.3 | 0.1 | 1.0 | 1.0 | 1.0 | 0.0
2017-03-11 00:00:00 | 2 | 24 | 0 | 0 | 0 | 24.5 | 28.2 | 21.4 | 6.8 | 0.3 | 43.1 | 43.1 | 43.1 | 0.0 | 5.0 | 5.0 | 4.9 | 0.1 | 13.5 | 13.5 | 13.4 | 0.1 | 1.0 | 1.0 | 1.0 | 0.0
2017-03-12 00:00:00 | 1 | 24 | 0 | 0 | 0 | 154.7 | 156.7 | 152.9 | 3.8 | 0.2 | 42.2 | 42.2 | 42.2 | 0.0 | 5.5 | 5.6 | 5.5 | 0.0 | 13.4 | 13.5 | 13.4 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0
2017-03-12 00:00:00 | 2 | 24 | 0 | 0 | 0 | 29.3 | 30.7 | 28.3 | 2.4 | 0.1 | 43.1 | 43.1 | 43.1 | 0.0 | 4.9 | 4.9 | 4.9 | 0.0 | 13.5 | 13.5 | 13.4 | 0.1 | 1.0 | 1.0 | 1.0 | 0.0
2017-03-13 00:00:00 | 1 | 24 | 0 | 0 | 0 | 161.2 | 166.1 | 156.9 | 9.2 | 0.4 | 42.2 | 42.2 | 42.2 | 0.0 | 5.5 | 5.5 | 5.4 | 0.1 | 13.4 | 13.5 | 13.4 | 0.1 | 1.0 | 1.0 | 1.0 | 0.0
2017-03-13 00:00:00 | 2 | 24 | 0 | 0 | 0 | 33.3 | 35.7 | 30.9 | 4.7 | 0.2 | 43.1 | 43.1 | 43.1 | 0.0 | 5.0 | 5.0 | 4.9 | 0.0 | 13.6 | 13.6 | 13.5 | 0.1 | 1.0 | 1.0 | 1.0 | 0.0
2017-03-14 00:00:00 | 1 | 24 | 0 | 0 | 0 | 168.7 | 170.7 | 166.4 | 4.4 | 0.2 | 42.2 | 42.2 | 42.2 | 0.0 | 5.4 | 5.4 | 5.4 | 0.0 | 13.5 | 13.7 | 13.4 | 0.3 | 1.0 | 1.0 | 1.0 | 0.0
2017-03-14 00:00:00 | 2 | 24 | 0 | 0 | 0 | 39.5 | 42.7 | 35.9 | 6.8 | 0.3 | 43.0 | 43.1 | 43.0 | 0.1 | 5.0 | 5.0 | 5.0 | 0.0 | 13.7 | 13.8 | 13.6 | 0.2 | 1.0 | 1.0 | 1.0 | 0.0
2017-03-14 00:00:00 | 3 | 24 | 0.3333333333333333 | 1 | 0 | 5.1 | 9.8 | 0.0 | 9.8 | 0.4 | 43.2 | 43.3 | 43.2 | 0.1 | 5.1 | 5.1 | 5.1 | 0.0 | 13.5 | 13.6 | 13.3 | 0.3 | 1.0 | 1.0 | 1.0 | 0.0
2017-03-15 00:00:00 | 1 | 24 | 0 | 0 | 0 | 172.4 | 173.8 | 170.9 | 3.0 | 0.1 | 42.1 | 42.2 | 42.1 | 0.0 | 5.3 | 5.3 | 5.3 | 0.0 | 13.8 | 13.8 | 13.7 | 0.2 | 1.0 | 1.0 | 1.0 | 0.0
2017-03-15 00:00:00 | 2 | 24 | 0 | 0 | 0 | 47.8 | 52.1 | 43.0 | 9.1 | 0.4 | 43.0 | 43.0 | 42.9 | 0.1 | 5.0 | 5.0 | 4.9 | 0.1 | 13.9 | 13.9 | 13.8 | 0.1 | 1.0 | 1.0 | 1.0 | 0.0
2017-03-15 00:00:00 | 3 | 24 | 0 | 0 | 0 | 14.7 | 17.7 | 10.3 | 7.4 | 0.3 | 43.2 | 43.2 | 43.1 | 0.1 | 5.0 | 5.1 | 5.0 | 0.1 | 13.7 | 13.8 | 13.6 | 0.2 | 1.0 | 1.0 | 1.0 | 0.0
2017-03-16 00:00:00 | 1 | 24 | 0 | 0 | 0 | 174.8 | 176.0 | 174.0 | 2.0 | 0.1 | 42.1 | 42.1 | 42.1 | 0.0 | 5.3 | 5.3 | 5.3 | 0.0 | 13.9 | 14.0 | 13.8 | 0.1 | 1.0 | 1.0 | 1.0 | 0.0
2017-03-16 00:00:00 | 2 | 24 | 0 | 0 | 0 | 57.0 | 61.1 | 52.5 | 8.6 | 0.4 | 42.9 | 42.9 | 42.9 | 0.1 | 4.9 | 4.9 | 4.9 | 0.1 | 14.0 | 14.0 | 13.9 | 0.1 | 1.0 | 1.0 | 1.0 | 0.0
2017-03-16 00:00:00 | 3 | 24 | 0 | 0 | 0 | 19.7 | 21.2 | 17.9 | 3.3 | 0.1 | 43.1 | 43.1 | 43.1 | 0.0 | 5.0 | 5.0 | 5.0 | 0.0 | 13.9 | 13.9 | 13.8 | 0.1 | 1.0 | 1.0 | 1.0 | 0.0
2017-03-17 00:00:00 | 1 | 24 | 0 | 0 | 0 | 180.6 | 187.1 | 176.2 | 10.9 | 0.5 | 42.1 | 42.1 | 42.1 | 0.0 | 5.4 | 5.5 | 5.3 | 0.1 | 13.9 | 14.0 | 13.7 | 0.2 | 1.0 | 1.0 | 1.0 | 0.0
2017-03-17 00:00:00 | 2 | 24 | 0 | 0 | 0 | 67.0 | 75.4 | 61.4 | 14.0 | 0.6 | 42.8 | 42.9 | 42.7 | 0.1 | 4.8 | 4.9 | 4.8 | 0.0 | 14.1 | 14.1 | 14.1 | 0.1 | 1.0 | 1.0 | 1.0 | 0.0
2017-03-17 00:00:00 | 3 | 24 | 0 | 0 | 0 | 26.0 | 32.7 | 21.4 | 11.3 | 0.5 | 43.1 | 43.1 | 43.0 | 0.1 | 5.0 | 5.1 | 5.0 | 0.0 | 14.0 | 14.1 | 13.9 | 0.2 | 1.0 | 1.0 | 1.0 | 0.0
...
@siradam @ShafiqShams @md-Junaid
Whoever needs pre-calculated aggregations, please demand this here. You can ask on any level, be it MPA, traj, and so on. You need to say whether the time period needs to be part of the aggregation or not. You can also just do this on your own and add such materialized views to the database yourself, if you like. There is an example in the comment above.
I will also try to add clusters, but up to now, I was not able to get plpython3u extension to run on TimescaleDB. We can expect that this will be possible soon.
There are workarounds to cluster the data just in postgreSQL as well, at least for kmeans. I might be able to add that to the materialized view. The aim is to add all the needed columns to that materialized view called "traj_daily" so that everyone can use it for visualization and analysis. The query speed will be the fastest possible since the results are pre-calculated in a materialized view.
If you want to connect to the tables and views of the database, follow the guide of the TimescaleDB webserver at https://github.com/siradam/DataMining_Project/issues/38#issuecomment-888197130
Search for: "How to really connect (with PostgreSQL + psql installed)" There you get the login data to get access to the database. You need to have PostgreSQL installed (which automatically installs the needed psql).
Support ticket at TimescaleDB Forge open now that asks for the plpython3u extension:
Question:
I wanted to try the extension plpython3u to do some dbscan or DTW clustering for a university seminar that is a time-series of geospatial data (lon / lat). I do not find plpython3u in the list. I would probably go for postgis extensions instead, then.
Answer:
You understood it correctly. "Untrusted" language extensions such as plpythonu cannot be supported as they would compromise our ability to guarantee the highest possible service level.
Thus, there is no plpython3u extension on the Web Server.
At least we know now which extensions are available on TimescaleDB. Anyone who is interested can have a closer look at the extensions and how we might use them.
tsdb=> select pae.* from current_setting('extwlist.extensions') AS cs(e) cross join regexp_split_to_table(e, ',') as ext(allowed) join pg_available_extensions as pae on (allowed=name) order by 1;
Out:
name | default_version | installed_version | comment
------------------------+-----------------+-------------------+----------------------------------------------------------------------
bloom | 1.0 | | bloom access method - signature file based index
btree_gin | 1.3 | | support for indexing common datatypes in GIN
btree_gist | 1.5 | | support for indexing common datatypes in GiST
citext | 1.6 | | data type for case-insensitive character strings
cube | 1.4 | | data type for multidimensional cubes
dict_int | 1.0 | | text search dictionary template for integers
dict_xsyn | 1.0 | | text search dictionary template for extended synonym processing
fuzzystrmatch | 1.1 | | determine similarities and distance between strings
hstore | 1.6 | | data type for storing sets of (key, value) pairs
intarray | 1.2 | | functions, operators, and index support for 1-D arrays of integers
isn | 1.2 | | data types for international product numbering standards
lo | 1.1 | | Large Object maintenance
ltree | 1.1 | | data type for hierarchical tree-like structures
pg_stat_statements | 1.7 | 1.7 | track execution statistics of all SQL statements executed
pg_trgm | 1.4 | | text similarity measurement and index searching based on trigrams
pgcrypto | 1.3 | | cryptographic functions
pgrouting | 3.1.3 | | pgRouting Extension
postgis | 3.1.2 | | PostGIS geometry and geography spatial types and functions
postgis_raster | 3.1.2 | | PostGIS raster types and functions
postgis_sfcgal | 3.1.2 | | PostGIS SFCGAL functions
postgis_tiger_geocoder | 3.1.2 | | PostGIS tiger geocoder and reverse geocoder
postgis_topology | 3.1.2 | | PostGIS topology spatial types and functions
postgres_fdw | 1.0 | | foreign-data wrapper for remote PostgreSQL servers
promscale | 0.2.0 | | Promscale support functions
seg | 1.3 | | data type for representing line segments or floating-point intervals
tablefunc | 1.0 | | functions that manipulate whole tables, including crosstab
tcn | 1.0 | | Triggered change notifications
timescale_analytics | 0.3 | | timescale_analytics
timescaledb | 2.3.1 | 2.3.1 | Enables scalable inserts and complex queries for time-series data
timescaledb_toolkit | 1.0 | 1.0 | timescaledb_toolkit
tsm_system_rows | 1.0 | | TABLESAMPLE method which accepts number of rows as a limit
tsm_system_time | 1.0 | | TABLESAMPLE method which accepts time in milliseconds as a limit
unaccent | 1.1 | | text search dictionary that removes accents
uuid-ossp | 1.1 | | generate universally unique identifiers (UUIDs)
(34 Zeilen)
I tried to get a more recent version of the Docker container of timescaleDB-PostgreSQL above, but it does not work with timescaledb:latest-pg11 till timescaledb:latest-pg13, since the Dockerfile throws the error:
postgresql-plpython3 (no such package): required by: .plpython3-deps-20210819.182405 [postgresql-plpython3]
It seems as if only PostgreSQL 10 gives you the chance to use plpython with that timescaleDB container.
To check whether timescale on Docker is the only way to go, I installed PostgreSQL 10 on Windows with the official EDB installer and on top of that, using Stack Builder to add pl/python to the EDB installation, which is recommended. When this still did not work, I copied the pyhton37.dll into the Windows\System32 directory as a known trick, but it still did not work. That is why it seems that this timescaleDB-PostgreSQL container is perhaps the only configuration on the net with a working pl/python extension. It was pure luck to find out about the timescaleDB-PostgreSQL container, that is why this is disappointing PostgreSQL service. Funny enough, timescaleDB does not offer plpython on the web server which could mean that plpython is just too insecure to be allowed on production systems and therefore was ignored by the recent developments.
That is why I must take the 2016 PostgreSQL 10 on a timescaleDB only to test plpython. Very strange, and likely not the best way to go. It all rather hints at Spark to replace PostgreSQL, as planned in issue #17.
After a full day invested into installing additional basic packages like pandas in the Alpine Docker container of PostgreSQL10 and timescale 0.9.0 (FROM timescale/timescaledb:0.9.0-pg10) I had to find out that apk (Alpine form of apt) does not support (well enough or not at all) basic packages in exactly this old Python 3.6 version, see Installing pandas in docker Alpine. I have installed Poetry to get the dependencies right, but it did not work.
After having tried getting Python extension and the packages to run on Windows (SEE UPDATE BELOW THAT WINDOWS IS INDEED POSSIBLE AS WELL) and on an outdated Alpine Docker container, I have now eventually succeeded in installing PostgreSQL and plpython3u on Linux (WSL2).
!!! UPDATE, 5 months after the problem had arised: Windows does support plpython3u !!! You just need to install Python v3.7.0 and not v3.7.9 as I did.
There is an official guide for Linux installations at PostgreSQL Downloads and PostgreSQL Wiki.
This is how to install it:
# Create the file repository configuration:
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
# Import the repository signing key:
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
# Update the package lists:
sudo apt-get update
# Install the latest version of PostgreSQL.
# If you want a specific version, use 'postgresql-12' or similar instead of 'postgresql':
sudo apt-get -y install postgresql
After this, install the extension, but check before whether you have two postgresql versions installed by running
service postgresql start
If you see two or more, as I had versions 12 and 13, consider either deleting the unneeded or changing the config and settings by following this link.
Install the plpython3u extension following PostgreSQL: how to install plpythonu extension:
sudo apt-cache search ".*plpython3.*"
sudo apt-get install postgresql-contrib postgresql-plpython3-13
Now change to the postgres user:
sudo su postgres
If you have two postgresql versions installed, you need to run psql with the right port. If your 13 version is at port 5433, run
psql --5433
If you only have one version installed, run
psql
After this, follow the typical testing of plpython3u by creating the return_version() function of above and checking its results.
Half a day invested into getting an import of Python packages done for a plpython3u stored procedure. No way up to now. The installation of pandas in Python 3.8.2 has no effect on the Python version 3.8.10 reported by PostgreSQL, the kmeans test function still asks for pandas. I did not understand how to use the solution of “Module not found” when importing a Python package within a plpython3u procedure.
This is done now. The easy mistake I made was to install the packages without sudo in front. We are now able to use the full range of python in stored procedures on database level.
Next steps are at #29.
As the other group has shown, you can use a relational database as a backend. The backend can for example be:
The backend is needed as the base for the visualisation team to use with Folium or d3.