siradam / DataMining_Project

0 stars 1 forks source link

Create relational database as backend #13

Closed lorenzznerol closed 3 years ago

lorenzznerol commented 3 years ago

As the other group has shown, you can use a relational database as a backend. The backend can for example be:

The backend is needed as the base for the visualisation team to use with Folium or d3.

lorenzznerol commented 3 years ago

The PostgreSQL database is setup. This is how to do it:

  1. Download and install PostgreSQL from https://www.postgresql.org/download/
  2. Follow this guide on https://www.postgresqltutorial.com/install-postgresql/ You can choose either "SQL Shell (psql)" together with the command prompt or the more comfortable GUI pgAdmin 4.
  3. Add a new database (under postgres user) called "marine" (or what you like)
  4. In pgAdmin 4, go to "marine" database, click on "Query Tool" and run after each other:
CREATE TABLE trajectories_nostokes_subset_10000 
(obs int, 
 traj real,
 MPA real, 
 distance real,
 land real,
 lat real,
 lon real,
 temp real,
 time timestamp,
 z real
);

and

CREATE TABLE trajectories_stokes_subset_10000 
(obs int, 
 traj real,
 MPA real, 
 distance real,
 land real,
 lat real,
 lon real,
 temp real,
 time timestamp,
 z real
);
  1. In order to add the given csv files, first unzip the csv.gz files so that you have trajectories_nostokes_subset_10000.csv and trajectories_stokes_subset_10000.csv in a folder of your choice.
  2. Go to "Schemas" --> "public" --> "Tables" --> Right-click --> choose "Import/Export..." and enter the menu as follows.

For the "nostokes" dataset:

bild

It should take around 80 seconds.

For the "stokes" dataset the same, just with the other path:

bild

It should take around 60 seconds.

  1. Now you can run a test query
SELECT obs, traj, mpa, distance, land, lat, lon, temp, "time", z
    FROM public.trajectories_nostokes_subset_10000 where traj = 0 order by obs limit 100;

bild

The "order by" is needed, the observations are not ordered in the source.

Running:

SELECT obs, traj, mpa, distance, land, lat, lon, temp, "time", z
    FROM public.trajectories_stokes_subset_10000 where traj = 0 order by obs limit 100;

bild

We see that the trajectories are much faster, the distance is almost double of the "nostokes" dataset.

obs 99, no Stokes drift: 99 0 1 15.530908 0 43.29053 5.2286572 13.36425 "2017-03-05 03:00:00" 1.0182366 obs 99, with Stokes drift: 99 0 1 27.268465 0.99102813 43.353794 5.2833734 0.120739646 "2017-03-05 03:00:00" 1.0182366 With Stokes drift, obs 54 has already reached the "nostokes" obs 99 distance: 54 0 1 15.538239 0.40967715 43.26185 5.3504033 7.9671803 "2017-03-03 06:00:00" 1.0182366

If it is taken away by the Stokes drift, the larvae trajectory also seems to take a slightly different way, since at the same distance, lon and lat differ quite a bit, and the land value is different.

It is unclear how interesting this comparison is, but it might lead us to add a possible connection between the two datasets not on the location and time, but on the distance.

  1. For a fast zoom into an area, we need indices on a rounded or otherwise categorized lat, lon, time. My suggestion is to kmeans cluster the dataset just on lat, lon, time, and then to use the exact simulation or real life location and time to find the nearest location/time cluster. And then we show the additional data for the dataset filtered on that cluster only. The analysis could cover the speed of the larvae (calculated from distance and time data), the mpa, and the temperature. In addition, trajectories could be clustered separately according to their speed and mpa, so that an analysis of different clusters of trajectories can be presented for the zoomed area.

It could also mean that we do not need to build any relations in the backend. Since we sometimes might want to join on the distance, and not on location or time. Even a join on traj is not a good idea, since we also might want to join on other columns independently.

Wrapping up:

My expectation is: we need both datasets as features or take them as labels in the later ML / DL model (either, by somehow calculating one single value from the two, or better, by taking them as a double label), since both indicate some possibility of truth.

To do:

lorenzznerol commented 3 years ago

From a development guide:

Think big (scalability)

Keep in mind that - for now - your tool might process just little amounts of data (e.g. 1 GB), but in the long run, the amount of data might become much more (e.g. 10 TB). Hence, when reading data consider what to do if the data is too big for the memory (e.g. read files not at once but in chunks of 10 MB).

This recommended coding practice shows that we will probably have to leave a Python-only analysis and use a lot of the SQL database power (or big data tools later in #17). Complex clustering algorithm like DTW, which takes a lot of time and which we can only do with Python, will need an aggregated level, that is, aggregated days/weeks/months, kmeans clusters, perhaps real value types rounded to 1 or 2 decimals, mean values, quantiles and so on.

The prototype of the clustering can still be done in Python, see #10.

lorenzznerol commented 3 years ago

Just documenting some postgreSQL / Python guides that were helpful up to now:

lorenzznerol commented 3 years ago

Folium connected to backend, see updated notebook #28

bild

lorenzznerol commented 3 years ago

As was reported in #38, a time-scale db (postgresql for time-series) seems to be a good choice. That is why this issue was reopened.


Add plpython3 Extension to Postgres/timescaledb Alpine Docker Image drew my attention on timescale.

Taking the Code from the Dockerfile above:

docker-compose.yml:

# Use postgres/example user/password credentials
version: '3.1'

services:

  db:
    container_name: postgres1
    build: .
    restart: always
    environment:
      POSTGRES_PASSWORD: postgres

  adminer:
    image: adminer
    container_name: postgres_adminer1
    restart: always
    ports:
      - 8080:8080

Dockerfile:

FROM timescale/timescaledb:0.9.0-pg10

RUN set -ex \
    && apk add --no-cache --virtual .plpython3-deps --repository http://nl.alpinelinux.org/alpine/edge/testing \
    postgresql-plpython3 \
    && ln -s /usr/lib/postgresql/plpython3.so /usr/local/lib/postgresql/plpython3.so \
    && ln -s /usr/share/postgresql/extension/plpython3u.control /usr/local/share/postgresql/extension/plpython3u.control \
    && ln -s /usr/share/postgresql/extension/plpython3u--1.0.sql /usr/local/share/postgresql/extension/plpython3u--1.0.sql \
    && ln -s /usr/share/postgresql/extension/plpython3u--unpackaged--1.0.sql /usr/local/share/postgresql/extension/plpython3u--unpackaged--1.0.sql

start by going into the folder containing the two files, then run

docker-compose up

Which gives:

bild

Create the language "plpython3":

bild

bild

bild

WORKING! I could start the plpython script!!

bild

bild

You can start the created containers again, no need to rebuild the docker-compose. This can be done by starting the two needed containers again:

docker container start postgres1
docker container start postgres_adminer1

psql is needed

When the postgres1 container is running (either by letting docker-compose up just running, or by starting postgres1), you can now also connect to postgres1 container using psql. This is needed when importing csv with headers - the headers parameter is not supported by adminer, but we need it.

docker exec -it postgres1 psql -U postgres -W postgres db

From : https://stackoverflow.com/questions/37099564/docker-how-can-run-the-psql-command-in-the-postgres-container/

Somehow the db as the database name that I gave it was ignored, but no harm, it works (REASON, later added: you do not need any value for the -W parameter, the first non-option value will automatically be the dbname, thus you need only docker exec -it postgres1 psql -U postgres -W db here):

psql: warning: extra command-line argument "db" ignored
Password for user postgres:
psql (10.3)
Type "help" for help.

After this, you can run as a first check:

postgres=# \du
                                   List of roles
 Role name |                         Attributes                         | Member of
-----------+------------------------------------------------------------+-----------
 postgres  | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

Here is a small share of what happens in the log of the two containers when opened with docker-compose up and I manually create an arbitrary table test_adminer_1 from the adminer menu:

adminer_1  | [Wed Aug 25 10:51:53 2021] [::ffff:172.25.0.1]:35910 [302]: POST /?pgsql=db&username=postgres&db=postgres&ns=public&create=
adminer_1  | [Wed Aug 25 10:51:53 2021] [::ffff:172.25.0.1]:35910 Closing
adminer_1  | [Wed Aug 25 10:51:53 2021] [::ffff:172.25.0.1]:35916 Accepted
adminer_1  | [Wed Aug 25 10:51:53 2021] [::ffff:172.25.0.1]:35916 [200]: GET /?pgsql=db&username=postgres&db=postgres&ns=public&table=test_adminer_1
adminer_1  | [Wed Aug 25 10:51:53 2021] [::ffff:172.25.0.1]:35916 Closing
adminer_1  | [Wed Aug 25 10:51:53 2021] [::ffff:172.25.0.1]:35922 Accepted

...

adminer_1  | [Wed Aug 25 10:53:12 2021] [::ffff:172.25.0.1]:35970 Closed without sending a request; it was probably just an unused speculative preconnection
adminer_1  | [Wed Aug 25 10:53:12 2021] [::ffff:172.25.0.1]:35970 Closing
adminer_1  | [Wed Aug 25 10:53:13 2021] [::ffff:172.25.0.1]:35974 Accepted
adminer_1  | [Wed Aug 25 10:53:18 2021] [::ffff:172.25.0.1]:35974 Closed without sending a request; it was probably just an unused speculative preconnection
adminer_1  | [Wed Aug 25 10:53:18 2021] [::ffff:172.25.0.1]:35974 Closing
postgres1  | 2021-08-25 10:54:36.697 UTC [63] STATEMENT:  SELECT * FROM test_adminer_1
postgres1  |    SELECT * FROM pg_catalog.pg_tables;

You can see this table test_adminer_1 that was made in the adminer browser window on Windows (localhost:8080) directly in the Linux psql, proving that they are all conntected to the same db:

postgres=# \dt
             List of relations
 Schema |      Name      | Type  |  Owner
--------+----------------+-------+----------
 public | test_adminer_1 | table | postgres
(1 row)

Small TODO (DONE):

Up to now, I have not found out how to connect from Windows psql to the docker postgres1 db. I know I should use the IP of the WSL that I am using, but that did not work. Instead, I probably need the IP of the container itself, see: https://stackoverflow.com/a/54058629/11154841. Or I need to check whether the ports are published, see https://stackoverflow.com/questions/56993263/connect-to-dockerized-postgres-from-windows-docker-host#56993487.

Getting the connection from Windows to Linux container is just a sidestep. It is perfectly fine to use psql on Linux instead. I just try this since Adminer is working as well, thus, it must be possible, and such a solution shows how this would work in a local network, if we had one.

UPDATE: this is DONE, I could connect from Windows psql to the Docker container. The trick was to give the host another port than the standard 5432 so that there is no port clash with the local standard PostgreSQL installation that already uses that 5432 port.

Windows psql:

Server [localhost]:
Database [postgres]:
Port [5432]: 5437
Username [postgres]:
Passwort für Benutzer postgres:
psql (13.3, Server 10.3)
Warnung: Konsolencodeseite (850) unterscheidet sich von der Windows-
         Codeseite (1252). 8-Bit-Zeichen funktionieren möglicherweise nicht
         richtig. Einzelheiten finden Sie auf der psql-Handbuchseite unter
         »Notes for Windows users«.
Geben Sie »help« für Hilfe ein.
postgres=# \l
                                 Liste der Datenbanken
   Name    | Eigent³mer | Kodierung | Sortierfolge | Zeichentyp |  Zugriffsprivilegien
-----------+------------+-----------+--------------+------------+-----------------------
 db        | postgres   | UTF8      | en_US.utf8   | en_US.utf8 |
 postgres  | postgres   | UTF8      | en_US.utf8   | en_US.utf8 |
 template0 | postgres   | UTF8      | en_US.utf8   | en_US.utf8 | =c/postgres          +
           |            |           |              |            | postgres=CTc/postgres
 template1 | postgres   | UTF8      | en_US.utf8   | en_US.utf8 | =c/postgres          +
           |            |           |              |            | postgres=CTc/postgres
(4 Zeilen)

Thus it shows the db database of the container.

Create a test table in the postgres database:

CREATE TABLE test ();

Then checking this on the Docker terminal:

 docker exec -it postgres1 psql -U postgres -W -d db
Password for user postgres:
psql (10.3)
Type "help" for help.

db=# \c postgres
Password for user postgres:
You are now connected to database "postgres" as user "postgres".
postgres=# \dt
             List of relations
 Schema |      Name      | Type  |  Owner
--------+----------------+-------+----------
 public | test           | table | postgres

Thus, the test table created in the Windows psql shell is also available in the docker container.

For postgres, we now have the tools:



Dumping a database with all of its functions and content in the home folder of Linux:

docker exec -i postgres1 pg_dump -U postgres geomar > geomar_20210829.sql

From: https://www.postgresql.org/docs/13/app-pgrestore.html https://www.postgresql.org/docs/13/app-pgdump.html https://www.postgresql.org/docs/13/app-psql.html

In order to load the backup file with the container, it needs to be mounted into a volume on the container, else we get https://stackoverflow.com/questions/45164267/pg-restore-archiver-could-not-open-input-file-no-such-file-or-director. This is because of security reasons: if you do not mount a volume with a given directory at the start of the container, it cannot be found later, since the container is not allowed to use more than what is inside of it. We therefore add the backup file to the folder of the docker-compose-file which is the "." folder. Then we add volumes: - ./:/home to the docker-compose file:

version: '3.1'

services:

  db:
    container_name: postgres1
    build: .
    restart: always
    ports:
        - "5437:5432"        
    environment:
        POSTGRES_DB: db
        POSTGRES_USER: postgres
        POSTGRES_PASSWORD: postgres
    volumes:
      - ./:/home

  adminer:
    image: adminer
    restart: always
    ports:
      - 8080:8080

After this, you can check in the bash whether the volume is mounted:

docker exec -it postgres1 bash
bash-4.4# ls
backup_init.sh              etc                         proc                        sys
bin                         home                        root                        tmp
dev                         lib                         run                         usr
docker-entrypoint-initdb.d  media                       sbin                        var
docker-entrypoint.sh        mnt                         srv
bash-4.4# cd home
bash-4.4# ls
Dockerfile            docker-compose.yml    geomar_20210829.sql
bash-4.4#

We see that the file geomar_20210829.sql can be found. Only now, we can also load the database from the backup into a new empty database:

docker exec -i postgres1 createdb -U postgres geomar3
docker exec -i postgres1 psql -U postgres -d geomar3 < ~/geomar_20210829.sql

Check that the data has been loaded:

geomar3=# \dt
                        Liste der Relationen
 Schema |                Name                |   Typ   | Eigent³mer
--------+------------------------------------+---------+------------
 public | nostokes                           | Tabelle | postgres
 public | stokes                             | Tabelle | postgres
 public | trajectories_nostokes_subset_10000 | Tabelle | postgres
 public | trajectories_stokes_subset_10000   | Tabelle | postgres
(4 Zeilen)

geomar3=# select * from stokes;
 obs | traj | mpa | distance | land |   lat   |   lon   |  temp   |        time         |    z
-----+------+-----+----------+------+---------+---------+---------+---------------------+---------
   0 |    0 |   1 |        0 |    0 | 43.2885 | 5.17132 | 13.4218 | 2017-03-01 00:00:00 | 1.01824
   0 |    1 |   1 |        0 |    0 | 43.2966 |  4.9829 | 12.9844 | 2017-03-05 00:00:00 | 1.01824
   0 |    2 |   1 |        0 |    0 | 43.2947 | 4.96284 | 13.4682 | 2017-03-10 00:00:00 | 1.01824
   0 |    3 |   1 |        0 |    0 | 43.2804 | 5.10096 | 13.3176 | 2017-03-14 00:00:00 | 1.01824
   0 |    4 |   1 |        0 |    0 | 43.2949 | 5.13431 | 13.7183 | 2017-03-19 00:00:00 | 1.01824
   0 |    5 |   1 |        0 |    0 | 43.2747 | 5.27987 | 13.8343 | 2017-03-23 00:00:00 | 1.01824
   0 |    6 |   1 |        0 |    0 | 43.3027 | 5.09533 | 14.2349 | 2017-03-28 00:00:00 | 1.01824
   0 |    7 |   1 |        0 |    0 | 43.2912 | 5.30152 | 14.5567 | 2017-04-01 00:00:00 | 1.01824
   0 |    8 |   1 |        0 |    0 | 43.3001 | 5.20514 | 14.4992 | 2017-04-06 00:00:00 | 1.01824

Next steps are to install Python packages, this is in a new comment below.

lorenzznerol commented 3 years ago

Prepare TimescaleDB so that we can use Grafana to visualize geospatial data stored in TimescaleDB, see #12.

https://docs.timescale.com/timescaledb/latest/tutorials/grafana/installation/#options-for-installing-grafana

bild

https://docs.timescale.com/timescaledb/latest/tutorials/grafana/create-dashboard-and-panel/#build-a-new-dashboard

Check the Grafana __timeFilter function

lorenzznerol commented 3 years ago

Check other tutorials of TimescaleDB whether they might be useful.

https://docs.timescale.com/timescaledb/latest/tutorials/

It is still likely that Grafana is already the most relevant since it is for geospatial analysis of a time-series.

lorenzznerol commented 3 years ago

TimescaleDB

Settings

Getting plpython3u extension might be possible by changing the config file. Perhaps, that is also possible for the docker container.

This is how the config file could be changed locally:

https://stackoverflow.com/questions/60746912/postgrsql-create-extension-timescale

Also check

https://github.com/timescale/timescaledb

and

https://github.com/timescale/timescaledb-tune

to tune the settings.

Hypertable

https://docs.timescale.com/timescaledb/latest/getting-started/create-hypertable/

Chunks and Hypertables are TimescaleDB's secret weapon that make storing and querying times-series data blazing fast at peta-byte scale.

TimescaleDB automatically partitions time-series data into chunks (or sub-tables), based on time and space (hash key, device ID, location or some distinct key). You can configure chunk size such that recent chunks fit memory for faster queries.

A hypertable is an abstraction over all chunks which hold time-series data. Hypertables enable you to query and access data from all the chunks as if it's in a single table. This is because commands made to the hypertable automatically propagate changes down to all of the chunks belonging to that hypertable.

Hypertables and chunks enable superior performance for shallow and wide queries common in real-time monitoring, as well as deep and narrow queries, common in time-series analysis.

TODO: add hypertable on top of the trajectories tables.

----------------------------------------
-- Hypertable to store weather metrics
----------------------------------------
-- Step 1: Define regular table
CREATE TABLE IF NOT EXISTS weather_metrics (

   time TIMESTAMP WITHOUT TIME ZONE NOT NULL,
   timezone_shift int NULL,
   city_name text NULL,
   temp_c double PRECISION NULL,
   feels_like_c double PRECISION NULL,
   temp_min_c double PRECISION NULL,
   temp_max_c double PRECISION NULL,
   pressure_hpa double PRECISION NULL,
   humidity_percent double PRECISION NULL,
   wind_speed_ms double PRECISION NULL,
   wind_deg int NULL,
   rain_1h_mm double PRECISION NULL,
   rain_3h_mm double PRECISION NULL,
   snow_1h_mm double PRECISION NULL,
   snow_3h_mm double PRECISION NULL,
   clouds_percent int NULL,
   weather_type_id int NULL
);

-- Step 2: Turn into hypertable
SELECT create_hypertable('weather_metrics','time');

Applying this to our settings:

tsdb=> SELECT create_hypertable('trajectories_stokes_subset_10000','time');
ERROR:  table "trajectories_stokes_subset_10000" is not empty
TIP:  You can migrate data by specifying 'migrate_data => true' when calling this function.

Thus, I need to have empty tables first.

CREATE TABLE nostokes 
(obs int, 
 traj real,
 MPA real, 
 distance real,
 land real,
 lat real,
 lon real,
 temp real,
 time timestamp,
 z real
);

CREATE TABLE stokes 
(obs int, 
 traj real,
 MPA real, 
 distance real,
 land real,
 lat real,
 lon real,
 temp real,
 time timestamp,
 z real
);

Then I can create the hypertable for stokes:

SELECT create_hypertable('stokes','time');

Output:

tsdb=> SELECT create_hypertable('stokes','time');
NOTICE:  adding not-null constraint to column "time"
DETAIL:  Time dimensions cannot have NULL values.
  create_hypertable
---------------------
 (1,public,stokes,t)
(1 Zeile)

And for nostokes:

SELECT create_hypertable('nostokes','time');

Output:

tsdb=> SELECT create_hypertable('nostokes','time');
NOTICE:  adding not-null constraint to column "time"
DETAIL:  Time dimensions cannot have NULL values.
   create_hypertable
-----------------------
 (2,public,nostokes,t)
(1 Zeile)

bild

Now fill the hypertables, each takes about 10 minutes:

\COPY nostokes(obs, traj, MPA, distance, land, lat, lon, temp, time, z) FROM 'C:\Users\MYUSER\Documents\Masterproject GEOMAR\Data\trajectories_nostokes_subset_10000.csv\trajectories_nostokes_subset_10000.csv' DELIMITER ',' CSV HEADER
COPY 9626734
\COPY stokes(obs, traj, MPA, distance, land, lat, lon, temp, time, z) FROM 'C:\Users\MYUSER\Documents\Masterproject GEOMAR\Data\trajectories_stokes_subset_10000.csv\trajectories_stokes_subset_10000.csv' DELIMITER ',' CSV HEADER
COPY 9640202
lorenzznerol commented 3 years ago

Example queries on the hypertables of TimescaleDB taken from https://docs.timescale.com/timescaledb/latest/getting-started/query-data/#time-bucket:

Advanced SQL functions for time-series data

Timescale has many custom-built SQL functions to help you perform time-series analysis in fewer lines of code.

Examples of these functions include:

time_bucket() - used for analyzing data over arbitrary time intervals
first() - used for finding the earliest value based on a time within an aggregate group
last() - used for finding the latest value based on time within an aggregate group
time_bucket_gapfill() - used to analyze data over arbitrary time intervals and fill any gaps in the data
locf() - used to fill gaps in data by carrying the last observed value forward
interpolate() - used fill gaps by linearly interpolating the missing values between known data points

Query:

SELECT MPA, avg(temp)
FROM stokes
WHERE time > '2017-06-17 00:00:00'
GROUP BY MPA;

Output:

 mpa |        avg
-----+--------------------
   0 |  24.83129397791901
   1 | 5.2805780800273165
   2 | 20.734297430803384
   3 |  4.548956152704943
   4 |  26.13433986831728
   5 |  8.542661807034637
   6 |  9.950746731604623
   7 |  24.77539022001049
   8 | 24.490924899923055
   9 | 3.7701473225911712
  10 |  2.972708292554311
(11 Zeilen)

SELECT
   time_bucket('2 weeks', time) as bucket,
   MPA,
   avg(temp) as avg_temp,
   max(temp) as max_temp,
   min(temp) as min_temp
FROM
 stokes
WHERE land = 0
GROUP BY bucket, MPA
ORDER BY MPA, bucket DESC;

Output:

       bucket        | mpa |      avg_temp      | max_temp  |  min_temp
---------------------+-----+--------------------+-----------+------------
 2017-09-04 00:00:00 |   0 |  24.82649793118004 | 26.950962 |    17.2496
 2017-08-21 00:00:00 |   0 |   25.3658009038291 | 27.407627 |   17.27619
 2017-08-07 00:00:00 |   0 |  25.31347652477798 | 28.731215 |    18.1381
 2017-07-24 00:00:00 |   0 |    25.236428776536 | 28.721586 |  16.514639
 2017-07-10 00:00:00 |   0 | 24.706983859973544 | 27.479097 |  16.878466
 2017-06-26 00:00:00 |   0 |  24.18940546878281 | 26.900537 |  16.135618
 2017-06-12 00:00:00 |   0 | 24.055359087743206 |  26.86533 |   18.35835
 2017-05-29 00:00:00 |   0 |  21.23546009986285 |  23.76144 |  16.513632
 2017-05-15 00:00:00 |   0 | 19.243815820999608 | 22.427155 |  16.069181
 2017-05-01 00:00:00 |   0 | 16.984227219106213 | 20.351355 |    14.3071
 2017-04-17 00:00:00 |   0 |  16.34690732351258 | 18.189545 |  14.204702
 2017-04-03 00:00:00 |   0 | 16.028830836404616 | 18.077717 |  13.549699
 2017-03-20 00:00:00 |   0 |  15.14634034914818 | 16.569317 |  13.163838
 2017-03-06 00:00:00 |   0 | 14.822324462913304 | 16.258774 |  12.355981
 2017-02-20 00:00:00 |   0 | 14.377901202577533 | 15.625333 |  12.292271
 2017-08-21 00:00:00 |   1 |  17.70642288525899 | 19.324305 |   17.11796
 2017-08-07 00:00:00 |   1 | 21.456999400082758 |  23.28298 |  18.169024
 2017-07-24 00:00:00 |   1 | 20.353629535969205 | 23.632862 |  16.568699
 2017-07-10 00:00:00 |   1 |  21.14121550465235 |  24.00419 |  16.379389
 2017-06-26 00:00:00 |   1 | 20.337803440199373 | 23.091074 |  15.823037
 2017-06-12 00:00:00 |   1 |  21.56045567917004 | 23.919235 |  17.799965
 2017-05-29 00:00:00 |   1 |  20.00516291401717 | 21.999432 |  16.284046
 2017-05-15 00:00:00 |   1 | 17.608608576294476 |  20.10287 |  15.896013
 2017-05-01 00:00:00 |   1 | 14.876005716960432 | 16.221592 |  14.261587
 2017-04-17 00:00:00 |   1 | 14.525842366450622 | 14.856123 |  14.162732
 2017-04-03 00:00:00 |   1 | 14.904146327769853 | 15.589169 |  14.249762
 2017-03-20 00:00:00 |   1 | 14.202115321830554 | 14.913743 |  13.596922
 2017-03-06 00:00:00 |   1 | 13.646966625679811 | 14.209642 |  12.868435
 2017-02-20 00:00:00 |   1 | 13.390813040386366 | 13.537243 |  12.958364
 2017-08-07 00:00:00 |   2 | 22.122080294291177 | 22.488985 |  21.506363
 2017-07-24 00:00:00 |   2 | 22.911628919609626 | 23.207022 |   21.78704
 2017-07-10 00:00:00 |   2 |  22.70578627367323 | 23.425163 |  22.047289
 2017-06-26 00:00:00 |   2 | 22.282030973107645 | 22.985039 |  21.018116
 2017-06-12 00:00:00 |   2 | 21.945727977655867 | 23.218987 |  20.525665
 2017-05-29 00:00:00 |   2 |  18.79565565417911 |  20.46295 |  16.969364
 2017-05-15 00:00:00 |   2 |  17.18793816288398 | 18.486628 |  16.164696
 2017-05-01 00:00:00 |   2 | 15.074951974704522 | 16.244446 |  14.661446
 2017-04-17 00:00:00 |   2 | 15.048093477884928 |  15.25091 |  14.678291
 2017-04-03 00:00:00 |   2 | 14.416856862129048 | 15.210911 |    13.5999
 2017-03-20 00:00:00 |   2 | 13.381607317155407 | 13.846324 |  13.037714
 2017-03-06 00:00:00 |   2 | 12.865583325715344 | 13.460182 |  12.035577
 2017-02-20 00:00:00 |   2 | 12.175282533545243 | 12.352346 |  12.026701
 2017-08-21 00:00:00 |   3 | 22.778899033864338 | 23.903532 |  21.915447
 2017-08-07 00:00:00 |   3 | 22.932897410054846 | 24.037523 |  21.745775
 2017-07-24 00:00:00 |   3 |  23.13750962516416 | 24.128115 |   21.62837
 2017-07-10 00:00:00 |   3 |  22.94398772121698 | 23.896088 |  21.676807
 2017-06-26 00:00:00 |   3 |  22.32229376801627 | 23.598776 |  20.707273
 2017-06-12 00:00:00 |   3 | 21.789856423988468 | 23.482073 |  20.237402
 2017-05-29 00:00:00 |   3 | 19.099566307100975 | 20.573812 |  16.915327
 2017-05-15 00:00:00 |   3 | 17.252427572437636 | 19.128645 |    16.0432
 2017-05-01 00:00:00 |   3 | 15.353004521591194 | 16.662237 |  14.584954
 2017-04-17 00:00:00 |   3 | 14.953890301460444 | 15.370744 |  14.571802
 2017-04-03 00:00:00 |   3 | 14.388060151207611 | 15.268351 |  13.439951
 2017-03-20 00:00:00 |   3 | 13.648541889431641 | 14.293538 |  13.103977
 2017-03-06 00:00:00 |   3 | 13.515276336402412 |  14.08204 |  12.418081
 2017-02-20 00:00:00 |   3 | 13.029267660196151 | 13.553823 |  12.263241
 2017-07-24 00:00:00 |   4 |  26.59736288622551 | 27.611195 |  25.937021
 2017-07-10 00:00:00 |   4 | 26.642848325655585 | 27.153797 |  25.156116
 2017-06-26 00:00:00 |   4 | 25.230230484330704 | 26.513733 |  24.294064
 2017-06-12 00:00:00 |   4 | 25.614871326573077 | 26.632156 |  23.358932
 2017-05-29 00:00:00 |   4 | 22.479892963102493 | 23.451082 |  21.779825
 2017-05-15 00:00:00 |   4 |  20.43452684461275 | 21.878067 |  19.106453
 2017-05-01 00:00:00 |   4 |  17.44851740750666 |  19.24197 |  16.240253
 2017-04-17 00:00:00 |   4 |  16.99341330488736 | 17.477165 |   16.20011
 2017-04-03 00:00:00 |   4 |  16.18180535211693 | 17.197947 |  15.207927
 2017-03-20 00:00:00 |   4 | 14.909845989286614 | 15.347641 |  14.485728
 2017-03-06 00:00:00 |   4 | 14.611053061700513 | 14.870977 |  14.116902
 2017-02-20 00:00:00 |   4 |  14.21967608149689 | 14.314707 |  14.102304
 2017-09-04 00:00:00 |   5 | 26.286715322925197 |  26.44224 |  26.123873
 2017-08-21 00:00:00 |   5 | 26.546544293562572 | 26.708786 |  26.429827
 2017-08-07 00:00:00 |   5 | 26.510715435962286 |  27.72983 |  25.024591
 2017-07-24 00:00:00 |   5 | 26.891172107215066 | 28.092472 |  25.998983
 2017-07-10 00:00:00 |   5 | 26.389373792010606 | 26.916443 |  25.854677
 2017-06-26 00:00:00 |   5 | 25.715911850832498 | 26.182154 |   25.15031
 2017-06-12 00:00:00 |   5 | 24.734423199309926 | 26.258776 |  23.003166
 2017-05-29 00:00:00 |   5 | 22.075411036126518 |  23.20628 |  20.859718
 2017-05-15 00:00:00 |   5 | 20.281381089496342 | 21.741549 |  18.588322
 2017-05-01 00:00:00 |   5 | 17.591673274224135 | 19.005657 |  16.346645
 2017-04-17 00:00:00 |   5 |  16.71526576260958 | 17.320423 |   16.20231
 2017-04-03 00:00:00 |   5 |  16.70224673525511 | 17.419226 |  15.458587
 2017-03-20 00:00:00 |   5 | 15.524549825954896 |  15.88382 |  15.093782
 2017-03-06 00:00:00 |   5 | 15.063820172041945 | 15.358897 |  14.556158
 2017-02-20 00:00:00 |   5 | 14.645551872934007 | 14.772932 |  14.463137
 2017-08-21 00:00:00 |   6 |  27.03604162656344 | 27.158514 |  26.913284
 2017-08-07 00:00:00 |   6 | 25.965313083522922 |  26.40548 |  25.563896
 2017-07-24 00:00:00 |   6 | 26.736780636263504 |  28.05643 |  25.128933
 2017-07-10 00:00:00 |   6 |  26.18702440973105 | 26.923248 |  25.066355
 2017-06-26 00:00:00 |   6 | 25.528028823959996 | 26.464766 |  24.728224
 2017-06-12 00:00:00 |   6 | 24.228188776410306 | 26.955032 |  22.982832
 2017-05-29 00:00:00 |   6 | 21.807786956762733 |     23.25 |  20.571451
 2017-05-15 00:00:00 |   6 | 20.282256336408317 |  21.46365 |   18.99056
 2017-05-01 00:00:00 |   6 | 17.227587089876387 | 19.169998 |  16.132702
 2017-04-17 00:00:00 |   6 | 16.430665033765848 | 17.027117 | 15.8864565
 2017-04-03 00:00:00 |   6 | 16.259949612653237 | 17.338223 |  15.315677
 2017-03-20 00:00:00 |   6 | 15.403516700334638 | 15.875272 |  15.202633
 2017-03-06 00:00:00 |   6 | 14.947396766693197 | 15.209571 |  14.535177
 2017-02-20 00:00:00 |   6 | 14.423605318796836 | 14.514571 |  14.343642
 2017-07-24 00:00:00 |   7 |  27.44804191088739 | 28.275007 |  26.379814
 2017-07-10 00:00:00 |   7 |  26.14251313096895 | 27.446873 |  24.768017
 2017-06-26 00:00:00 |   7 | 24.382099903651646 |   26.0925 |   23.22902
 2017-06-12 00:00:00 |   7 | 24.574668076143688 | 26.041327 |  22.445032
 2017-05-29 00:00:00 |   7 |  21.70993964935587 | 23.358862 |  20.291487
 2017-05-15 00:00:00 |   7 | 19.877360928530074 | 21.258327 |   18.76085
 2017-05-01 00:00:00 |   7 | 18.153447741553897 |   19.0056 |  16.881422
 2017-04-17 00:00:00 |   7 |  17.18622031496532 | 18.005177 |  16.712332
 2017-04-03 00:00:00 |   7 | 16.391833282437364 | 17.588385 |  15.827705
 2017-03-20 00:00:00 |   7 | 15.547529755096576 | 16.102781 |  15.016023
 2017-03-06 00:00:00 |   7 |  15.14862707963377 | 15.515274 |  14.400544
 2017-02-20 00:00:00 |   7 | 14.652459615819595 | 14.780584 |  14.421801
 2017-08-07 00:00:00 |   8 |  28.35088086128235 | 28.458033 |   27.65335
 2017-07-24 00:00:00 |   8 | 26.982449991719704 | 28.403843 |  25.559795
 2017-07-10 00:00:00 |   8 | 25.163873099178247 | 26.364727 |  24.125029
 2017-06-26 00:00:00 |   8 | 23.840556455805977 |  25.09964 |  23.055895
 2017-06-12 00:00:00 |   8 |  23.52384153660053 | 25.259384 |   21.96889
 2017-05-29 00:00:00 |   8 | 21.311403647163846 | 22.247269 |   20.09866
 2017-05-15 00:00:00 |   8 | 19.149573199245907 | 20.318583 |  18.356333
 2017-05-01 00:00:00 |   8 |  17.86146883415965 | 18.757013 |  16.700336
 2017-04-17 00:00:00 |   8 |  17.00029951749074 |  17.72664 |  16.639894
 2017-04-03 00:00:00 |   8 | 16.386563993204494 |  17.35961 |  15.892275
 2017-03-20 00:00:00 |   8 |  15.48317863950398 |  15.93635 | 15.0099125
 2017-03-06 00:00:00 |   8 | 15.334526640124025 | 15.713153 |  14.912541
 2017-02-20 00:00:00 |   8 | 14.820478399594625 | 15.025774 |   14.65675
 2017-08-07 00:00:00 |   9 | 26.104159379699542 | 27.735556 |  25.249928
 2017-07-24 00:00:00 |   9 | 26.635963996507343 |  27.78495 |  25.181967
 2017-07-10 00:00:00 |   9 |   24.9525658785456 |  26.29912 |  23.682384
 2017-06-26 00:00:00 |   9 | 22.686746013953712 | 24.912384 |  19.114254
 2017-06-12 00:00:00 |   9 | 23.307249599679615 | 25.148794 |  21.350853
 2017-05-29 00:00:00 |   9 | 20.882533541490766 | 22.015558 |  19.098852
 2017-05-15 00:00:00 |   9 |  18.28821899775348 | 19.644644 |  16.654434
 2017-05-01 00:00:00 |   9 | 17.333890269954203 | 18.475035 |  15.575313
 2017-04-17 00:00:00 |   9 | 17.017341694837476 | 17.717674 |   16.46179
 2017-04-03 00:00:00 |   9 | 16.568411441368752 |   17.2555 |  15.872281
 2017-03-20 00:00:00 |   9 | 15.524350895146666 | 16.249016 |  14.999238
 2017-03-06 00:00:00 |   9 | 15.464346737382858 | 16.186192 |  14.768786
 2017-02-20 00:00:00 |   9 | 14.949266238037504 | 15.358558 |  14.625947
 2017-09-04 00:00:00 |  10 | 22.872619473934172 | 26.634878 |  16.920845
 2017-08-21 00:00:00 |  10 | 25.779601116328276 | 27.363544 |  17.356138
 2017-08-07 00:00:00 |  10 |  25.50809705335015 | 28.599775 |   18.97854
 2017-07-24 00:00:00 |  10 | 26.043864750089476 | 28.700052 |  16.232222
 2017-07-10 00:00:00 |  10 | 24.996329376259673 | 27.566383 |  16.340588
 2017-06-26 00:00:00 |  10 | 23.844787870081397 | 26.749493 |  16.112259
 2017-06-12 00:00:00 |  10 | 23.867726366557942 | 26.963821 |  18.644419
 2017-05-29 00:00:00 |  10 |  21.25132758940279 | 23.808897 |  15.805724
 2017-05-15 00:00:00 |  10 |  19.06877684681747 | 22.091215 |   15.99553
 2017-05-01 00:00:00 |  10 |  17.37440058009478 | 20.505127 |  14.425107
 2017-04-17 00:00:00 |  10 | 16.447088982954845 | 18.280336 |  14.239057
 2017-04-03 00:00:00 |  10 |  15.83289019071197 |  18.00387 |  13.495498
 2017-03-20 00:00:00 |  10 | 15.390403565608105 | 16.877867 |  13.107029
 2017-03-06 00:00:00 |  10 |  14.94302555956803 | 16.011692 |  12.218872
 2017-02-20 00:00:00 |  10 | 14.394304068192191 | 15.040423 |   12.17905
(150 Zeilen)

Aggregate on the mpa level:

SELECT time_bucket('30 days', time) as bucket,
   MPA, count(MPA) as count
   FROM stokes
   GROUP BY bucket, MPA
   ORDER BY MPA, bucket DESC;

Output:

       bucket        | mpa | count
---------------------+-----+--------
 2017-08-31 00:00:00 |   0 |  27920
 2017-08-01 00:00:00 |   0 | 429496
 2017-07-02 00:00:00 |   0 | 638021
 2017-06-02 00:00:00 |   0 | 626148
 2017-05-03 00:00:00 |   0 | 768311
 2017-04-03 00:00:00 |   0 | 806947
 2017-03-04 00:00:00 |   0 | 397412
 2017-02-02 00:00:00 |   0 |   3013
 2017-08-31 00:00:00 |   1 |   2003
 2017-08-01 00:00:00 |   1 |  13378
 2017-07-02 00:00:00 |   1 |  16075
 2017-06-02 00:00:00 |   1 |  32255
 2017-05-03 00:00:00 |   1 |  49163
 2017-04-03 00:00:00 |   1 |  20491
 2017-03-04 00:00:00 |   1 |  27451
 2017-02-02 00:00:00 |   1 |    969
 2017-08-01 00:00:00 |   2 |     65
 2017-07-02 00:00:00 |   2 |   2096
 2017-06-02 00:00:00 |   2 |   1572
 2017-05-03 00:00:00 |   2 |   1706
 2017-04-03 00:00:00 |   2 |    953
 2017-03-04 00:00:00 |   2 |    731
 2017-02-02 00:00:00 |   2 |     69
 2017-08-31 00:00:00 |   3 |   1596
 2017-08-01 00:00:00 |   3 |  23894
 2017-07-02 00:00:00 |   3 |  51036
 2017-06-02 00:00:00 |   3 |  68122
 2017-05-03 00:00:00 |   3 |  73686
 2017-04-03 00:00:00 |   3 |  57416
 2017-03-04 00:00:00 |   3 |  45431
 2017-02-02 00:00:00 |   3 |    870
 2017-08-01 00:00:00 |   4 |     31
 2017-07-02 00:00:00 |   4 |   4007
 2017-06-02 00:00:00 |   4 |   5931
 2017-05-03 00:00:00 |   4 |   4824
 2017-04-03 00:00:00 |   4 |   7105
 2017-03-04 00:00:00 |   4 |   5406
 2017-02-02 00:00:00 |   4 |    336
 2017-08-31 00:00:00 |   5 |    761
 2017-08-01 00:00:00 |   5 |  18930
 2017-07-02 00:00:00 |   5 |  59276
 2017-06-02 00:00:00 |   5 |  54138
 2017-05-03 00:00:00 |   5 |  74212
 2017-04-03 00:00:00 |   5 | 106657
 2017-03-04 00:00:00 |   5 |  33696
 2017-02-02 00:00:00 |   5 |    970
 2017-08-31 00:00:00 |   6 |    200
 2017-08-01 00:00:00 |   6 |   7956
 2017-07-02 00:00:00 |   6 |  14249
 2017-06-02 00:00:00 |   6 |  16333
 2017-05-03 00:00:00 |   6 |  19912
 2017-04-03 00:00:00 |   6 |  18782
 2017-03-04 00:00:00 |   6 |   9596
 2017-02-02 00:00:00 |   6 |    697
 2017-08-01 00:00:00 |   7 |    157
 2017-07-02 00:00:00 |   7 |   1964
 2017-06-02 00:00:00 |   7 |   1729
 2017-05-03 00:00:00 |   7 |   1703
 2017-04-03 00:00:00 |   7 |   2482
 2017-03-04 00:00:00 |   7 |   2222
 2017-02-02 00:00:00 |   7 |    145
 2017-08-01 00:00:00 |   8 |     24
 2017-07-02 00:00:00 |   8 |    462
 2017-06-02 00:00:00 |   8 |    672
 2017-05-03 00:00:00 |   8 |   1168
 2017-04-03 00:00:00 |   8 |    780
 2017-03-04 00:00:00 |   8 |    838
 2017-02-02 00:00:00 |   8 |     15
 2017-08-31 00:00:00 |   9 |    572
 2017-08-01 00:00:00 |   9 |  69097
 2017-07-02 00:00:00 |   9 | 170611
 2017-06-02 00:00:00 |   9 | 197504
 2017-05-03 00:00:00 |   9 | 131471
 2017-04-03 00:00:00 |   9 | 155801
 2017-03-04 00:00:00 |   9 |  63782
 2017-02-02 00:00:00 |   9 |   1550
 2017-08-31 00:00:00 |  10 |  38744
 2017-08-01 00:00:00 |  10 | 596312
 2017-07-02 00:00:00 |  10 | 932533
 2017-06-02 00:00:00 |  10 | 885804
 2017-05-03 00:00:00 |  10 | 764152
 2017-04-03 00:00:00 |  10 | 679142
 2017-03-04 00:00:00 |  10 | 285763
 2017-02-02 00:00:00 |  10 |    702
(84 Zeilen)

Or more details, as an example, on MPA level again:

SELECT
   time_bucket('2 weeks', time) as bucket,
   MPA,
   round( CAST(avg(temp) as numeric), 1) as avg_temp,
   round( CAST(max(temp) as numeric), 1) as max_temp,
   round( CAST(min(temp) as numeric), 1) as min_temp,
   round( CAST(avg(lon) as numeric), 1) as avg_lon,
   round( CAST(max(lon) as numeric), 1) as max_lon,
   round( CAST(min(lon) as numeric), 1) as min_lon,
   round( CAST(avg(lat) as numeric), 1) as avg_lat,
   round( CAST(max(lat) as numeric), 1) as max_lat,
   round( CAST(min(lat) as numeric), 1) as min_lat
FROM
 stokes
WHERE land = 0
GROUP BY bucket, MPA
ORDER BY MPA, bucket DESC
LIMIT 10;

Output:

       bucket        | mpa | avg_temp | max_temp | min_temp | avg_lon | max_lon | min_lon | avg_lat | max_lat | min_lat
---------------------+-----+----------+----------+----------+---------+---------+---------+---------+---------+---------
 2017-09-04 00:00:00 |   0 |     24.8 |     27.0 |     17.2 |     1.9 |     8.2 |    -4.5 |    38.7 |    43.1 |    35.3
 2017-08-21 00:00:00 |   0 |     25.4 |     27.4 |     17.3 |     1.9 |     8.4 |    -5.1 |    38.9 |    43.2 |    35.2
 2017-08-07 00:00:00 |   0 |     25.3 |     28.7 |     18.1 |     2.1 |     8.1 |    -5.2 |    39.0 |    43.4 |    35.2
 2017-07-24 00:00:00 |   0 |     25.2 |     28.7 |     16.5 |     2.6 |     8.9 |    -4.1 |    39.7 |    43.4 |    35.4
 2017-07-10 00:00:00 |   0 |     24.7 |     27.5 |     16.9 |     3.0 |     8.2 |    -3.7 |    40.1 |    43.4 |    35.2
 2017-06-26 00:00:00 |   0 |     24.2 |     26.9 |     16.1 |     3.1 |     8.3 |    -3.8 |    40.0 |    43.3 |    35.2
 2017-06-12 00:00:00 |   0 |     24.1 |     26.9 |     18.4 |     3.4 |     7.9 |    -4.9 |    40.5 |    43.3 |    35.3
 2017-05-29 00:00:00 |   0 |     21.2 |     23.8 |     16.5 |     2.9 |     7.7 |    -4.9 |    40.1 |    43.3 |    35.2
 2017-05-15 00:00:00 |   0 |     19.2 |     22.4 |     16.1 |     2.5 |     8.0 |    -5.2 |    39.7 |    43.4 |    35.2
 2017-05-01 00:00:00 |   0 |     17.0 |     20.4 |     14.3 |     2.4 |     8.2 |    -5.2 |    39.6 |    43.3 |    35.2
(10 Zeilen)

Aggregate on the traj level:

SELECT *, 
round( CAST((x.max_distance - x.min_distance) / count_obs as numeric), 1) as avg_speed, 
max_distance - min_distance as diff_distance, 
max_lat - min_lat as diff_lat, 
max_lon - min_lon as diff_lon, 
max_temp - min_temp as diff_temp,
max_z - min_z as diff_z

FROM (

SELECT
   time_bucket('1 day', time) as bucket,
   traj,
   count(obs) as count_obs,
   avg(MPA) as avg_mpa,
   max(MPA) as max_mpa,
   min(MPA) as min_mpa,
   round( CAST(avg(distance) as numeric), 1) as avg_distance,
   round( CAST(max(distance) as numeric), 1) as max_distance,
   round( CAST(min(distance) as numeric), 1) as min_distance,
   round( CAST(avg(lat) as numeric), 1) as avg_lat,
   round( CAST(max(lat) as numeric), 1) as max_lat,
   round( CAST(min(lat) as numeric), 1) as min_lat,
   round( CAST(avg(lon) as numeric), 1) as avg_lon,
   round( CAST(max(lon) as numeric), 1) as max_lon,
   round( CAST(min(lon) as numeric), 1) as min_lon,
   round( CAST(avg(temp) as numeric), 1) as avg_temp,
   round( CAST(max(temp) as numeric), 1) as max_temp,
   round( CAST(min(temp) as numeric), 1) as min_temp,
   round( CAST(avg(z) as numeric), 1) as avg_z,
   round( CAST(max(z) as numeric), 1) as max_z,
   round( CAST(min(z) as numeric), 1) as min_z
FROM
 stokes
WHERE land = 0
and traj < 10
GROUP BY bucket, traj
ORDER BY traj, bucket DESC
LIMIT 20) x;

Output:

       bucket        | traj | count_obs | avg_mpa | max_mpa | min_mpa | avg_distance | max_distance | min_distance | avg_lat | max_lat | min_lat | avg_lon | max_lon | min_lon | avg_temp | max_temp | min_temp | avg_z | max_z | min_z | avg_speed | diff_distance | diff_lat | diff_lon | diff_temp | diff_z
---------------------+------+-----------+---------+---------+---------+--------------+--------------+--------------+---------+---------+---------+---------+---------+---------+----------+----------+----------+-------+-------+-------+-----------+---------------+----------+----------+-----------+--------
 2017-03-08 00:00:00 |    0 |         1 |       1 |       1 |       1 |         33.3 |         33.3 |         33.3 |    43.3 |    43.3 |    43.3 |     5.3 |     5.3 |     5.3 |     13.4 |     13.4 |     13.4 |   1.0 |   1.0 |   1.0 |       0.0 |           0.0 |      0.0 |      0.0 |       0.0 |    0.0
 2017-03-03 00:00:00 |    0 |         3 |       1 |       1 |       1 |         20.2 |         21.0 |         19.5 |    43.3 |    43.3 |    43.3 |     5.3 |     5.3 |     5.3 |     13.5 |     13.5 |     13.5 |   1.0 |   1.0 |   1.0 |       0.5 |           1.5 |      0.0 |      0.0 |       0.0 |    0.0
 2017-03-02 00:00:00 |    0 |        13 |       1 |       1 |       1 |         12.7 |         13.5 |         11.8 |    43.3 |    43.3 |    43.3 |     5.3 |     5.3 |     5.3 |     13.5 |     13.5 |     13.5 |   1.0 |   1.0 |   1.0 |       0.1 |           1.7 |      0.0 |      0.0 |       0.0 |    0.0
 2017-03-01 00:00:00 |    0 |        24 |       1 |       1 |       1 |          6.9 |         11.6 |          0.0 |    43.3 |    43.3 |    43.3 |     5.3 |     5.3 |     5.2 |     13.5 |     13.5 |     13.4 |   1.0 |   1.0 |   1.0 |       0.5 |          11.6 |      0.0 |      0.1 |       0.1 |    0.0
 2017-04-14 00:00:00 |    1 |         2 |       0 |       0 |       0 |        516.9 |        517.0 |        516.9 |    40.9 |    40.9 |    40.9 |     5.8 |     5.8 |     5.8 |     16.2 |     16.2 |     16.2 |   1.0 |   1.0 |   1.0 |       0.1 |           0.1 |      0.0 |      0.0 |       0.0 |    0.0
 2017-04-13 00:00:00 |    1 |        24 |       0 |       0 |       0 |        510.5 |        516.3 |        505.7 |    40.9 |    40.9 |    40.9 |     5.8 |     5.8 |     5.7 |     16.0 |     16.2 |     15.8 |   1.0 |   1.0 |   1.0 |       0.4 |          10.6 |      0.0 |      0.1 |       0.4 |    0.0
 2017-04-12 00:00:00 |    1 |        24 |       0 |       0 |       0 |        501.6 |        505.4 |        496.7 |    40.9 |    41.0 |    40.9 |     5.7 |     5.7 |     5.7 |     15.8 |     15.8 |     15.7 |   1.0 |   1.0 |   1.0 |       0.4 |           8.7 |      0.1 |      0.0 |       0.1 |    0.0
 2017-04-11 00:00:00 |    1 |        24 |       0 |       0 |       0 |        489.6 |        496.2 |        483.2 |    41.1 |    41.1 |    41.0 |     5.7 |     5.7 |     5.6 |     15.8 |     15.8 |     15.7 |   1.0 |   1.0 |   1.0 |       0.5 |          13.0 |      0.1 |      0.1 |       0.1 |    0.0
 2017-04-10 00:00:00 |    1 |        24 |       0 |       0 |       0 |        479.2 |        482.7 |        477.4 |    41.1 |    41.1 |    41.1 |     5.6 |     5.6 |     5.6 |     15.8 |     15.9 |     15.8 |   1.0 |   1.0 |   1.0 |       0.2 |           5.3 |      0.0 |      0.0 |       0.1 |    0.0
 2017-04-09 00:00:00 |    1 |        24 |       0 |       0 |       0 |        476.2 |        477.3 |        474.8 |    41.1 |    41.1 |    41.1 |     5.6 |     5.6 |     5.6 |     15.8 |     15.9 |     15.7 |   1.0 |   1.0 |   1.0 |       0.1 |           2.5 |      0.0 |      0.0 |       0.2 |    0.0
 2017-04-08 00:00:00 |    1 |        24 |       0 |       0 |       0 |        473.0 |        474.7 |        471.1 |    41.1 |    41.1 |    41.1 |     5.6 |     5.6 |     5.6 |     15.5 |     15.6 |     15.3 |   1.0 |   1.0 |   1.0 |       0.2 |           3.6 |      0.0 |      0.0 |       0.3 |    0.0
 2017-04-07 00:00:00 |    1 |        24 |       0 |       0 |       0 |        468.3 |        470.9 |        465.0 |    41.1 |    41.1 |    41.1 |     5.6 |     5.7 |     5.6 |     15.1 |     15.3 |     14.9 |   1.0 |   1.0 |   1.0 |       0.2 |           5.9 |      0.0 |      0.1 |       0.4 |    0.0
 2017-04-06 00:00:00 |    1 |        24 |       0 |       0 |       0 |        459.4 |        464.6 |        453.5 |    41.1 |    41.2 |    41.1 |     5.7 |     5.8 |     5.7 |     14.7 |     14.9 |     14.6 |   1.0 |   1.0 |   1.0 |       0.5 |          11.1 |      0.1 |      0.1 |       0.3 |    0.0
 2017-04-05 00:00:00 |    1 |        24 |       0 |       0 |       0 |        444.8 |        452.9 |        434.8 |    41.2 |    41.3 |    41.2 |     5.7 |     5.8 |     5.7 |     14.4 |     14.6 |     14.3 |   1.0 |   1.0 |   1.0 |       0.8 |          18.1 |      0.1 |      0.1 |       0.3 |    0.0
 2017-04-04 00:00:00 |    1 |        24 |       0 |       0 |       0 |        425.0 |        433.9 |        417.9 |    41.4 |    41.5 |    41.3 |     5.6 |     5.6 |     5.6 |     14.2 |     14.3 |     14.2 |   1.0 |   1.0 |   1.0 |       0.7 |          16.0 |      0.2 |      0.0 |       0.1 |    0.0
 2017-04-03 00:00:00 |    1 |        24 |       0 |       0 |       0 |        410.7 |        417.5 |        402.9 |    41.5 |    41.6 |    41.5 |     5.6 |     5.6 |     5.6 |     14.2 |     14.2 |     14.2 |   1.0 |   1.0 |   1.0 |       0.6 |          14.6 |      0.1 |      0.0 |       0.0 |    0.0
 2017-04-02 00:00:00 |    1 |        24 |       0 |       0 |       0 |        391.5 |        402.2 |        378.5 |    41.7 |    41.8 |    41.6 |     5.6 |     5.6 |     5.6 |     14.2 |     14.3 |     14.2 |   1.0 |   1.0 |   1.0 |       1.0 |          23.7 |      0.2 |      0.0 |       0.1 |    0.0
 2017-04-01 00:00:00 |    1 |        24 |       0 |       0 |       0 |        363.9 |        377.2 |        355.9 |    41.9 |    42.0 |    41.8 |     5.6 |     5.7 |     5.6 |     14.5 |     14.7 |     14.3 |   1.0 |   1.0 |   1.0 |       0.9 |          21.3 |      0.2 |      0.1 |       0.4 |    0.0
 2017-03-31 00:00:00 |    1 |        24 |       0 |       0 |       0 |        353.3 |        355.6 |        351.2 |    42.0 |    42.0 |    42.0 |     5.7 |     5.7 |     5.7 |     14.8 |     14.9 |     14.7 |   1.0 |   1.0 |   1.0 |       0.2 |           4.4 |      0.0 |      0.0 |       0.2 |    0.0
 2017-03-30 00:00:00 |    1 |        24 |       0 |       0 |       0 |        348.0 |        351.0 |        345.0 |    42.1 |    42.1 |    42.0 |     5.7 |     5.7 |     5.7 |     14.8 |     14.9 |     14.7 |   1.0 |   1.0 |   1.0 |       0.3 |           6.0 |      0.1 |      0.0 |       0.2 |    0.0
(20 Zeilen)
lorenzznerol commented 3 years ago

TimescaleDB: Create a continuous aggregate

https://docs.timescale.com/timescaledb/latest/getting-started/create-cagg/

Continuous aggregates are automatically refreshed materialized views – so they massively speed up workloads that need to process large amounts of data. Unlike in other databases, your views are automatically refreshed in the background as new data is added, or as old data is modified according to a schedule.

That means:

The previous queries can be stored in pre-calculated tables so that everyone can access the aggregates directly. This is especially needed on traj level when we aggregate on a daily base or every 5 hours, for example.

By this, we can calculate data at the shortest query time possible (since it is just asking for the content of pre-calculated materialized views), and not only that, we can also continuously update without affecting the query time. That might get more important on the long run, not now.

Step 1: Define view

Example:

It is not possible to create a nested view in one go. We need to use window functions or calculate things many times again to get around that, or we need to create views on top of each other.

The original (and working) query was like this:

SELECT *, 
round( CAST((x.max_distance - x.min_distance) / count_obs as numeric), 1) as avg_speed, 
max_distance - min_distance as diff_distance, 
max_lat - min_lat as diff_lat, 
max_lon - min_lon as diff_lon, 
max_temp - min_temp as diff_temp,
max_z - min_z as diff_z

FROM (

SELECT
   time_bucket('1 day', time) as bucket,
   traj,
   count(obs) as count_obs,
   avg(MPA) as avg_mpa,
   max(MPA) as max_mpa,
   min(MPA) as min_mpa,
   round( CAST(avg(distance) as numeric), 1) as avg_distance,
   round( CAST(max(distance) as numeric), 1) as max_distance,
   round( CAST(min(distance) as numeric), 1) as min_distance,
   round( CAST(avg(lat) as numeric), 1) as avg_lat,
   round( CAST(max(lat) as numeric), 1) as max_lat,
   round( CAST(min(lat) as numeric), 1) as min_lat,
   round( CAST(avg(lon) as numeric), 1) as avg_lon,
   round( CAST(max(lon) as numeric), 1) as max_lon,
   round( CAST(min(lon) as numeric), 1) as min_lon,
   round( CAST(avg(temp) as numeric), 1) as avg_temp,
   round( CAST(max(temp) as numeric), 1) as max_temp,
   round( CAST(min(temp) as numeric), 1) as min_temp,
   round( CAST(avg(z) as numeric), 1) as avg_z,
   round( CAST(max(z) as numeric), 1) as max_z,
   round( CAST(min(z) as numeric), 1) as min_z
FROM
 stokes
WHERE land = 0
GROUP BY bucket, traj
) x

But that throws an error:

ERROR: invalid continuous aggregate view TIP: Include at least one aggregate function and a GROUP BY clause with time bucket.

tsdb=> CREATE MATERIALIZED VIEW traj_daily
tsdb-> WITH (timescaledb.continuous)
tsdb-> AS
tsdb-> SELECT *,
tsdb-> round( CAST((x.max_distance - x.min_distance) / count_obs as numeric), 1) as avg_speed,
tsdb-> max_distance - min_distance as diff_distance,
tsdb-> max_lat - min_lat as diff_lat,
tsdb-> max_lon - min_lon as diff_lon,
tsdb-> max_temp - min_temp as diff_temp,
tsdb-> max_z - min_z as diff_z
tsdb->
tsdb-> FROM (
tsdb(>
tsdb(> SELECT
tsdb(>    time_bucket('1 day', time) as bucket,
tsdb(>    traj,
tsdb(>    count(obs) as count_obs,
tsdb(>    avg(MPA) as avg_mpa,
tsdb(>    max(MPA) as max_mpa,
tsdb(>    min(MPA) as min_mpa,
tsdb(>    round( CAST(avg(distance) as numeric), 1) as avg_distance,
tsdb(>    round( CAST(max(distance) as numeric), 1) as max_distance,
tsdb(>    round( CAST(min(distance) as numeric), 1) as min_distance,
tsdb(>    round( CAST(avg(lat) as numeric), 1) as avg_lat,
tsdb(>    round( CAST(max(lat) as numeric), 1) as max_lat,
tsdb(>    round( CAST(min(lat) as numeric), 1) as min_lat,
tsdb(>    round( CAST(avg(lon) as numeric), 1) as avg_lon,
tsdb(>    round( CAST(max(lon) as numeric), 1) as max_lon,
tsdb(>    round( CAST(min(lon) as numeric), 1) as min_lon,
tsdb(>    round( CAST(avg(temp) as numeric), 1) as avg_temp,
tsdb(>    round( CAST(max(temp) as numeric), 1) as max_temp,
tsdb(>    round( CAST(min(temp) as numeric), 1) as min_temp,
tsdb(>    round( CAST(avg(z) as numeric), 1) as avg_z,
tsdb(>    round( CAST(max(z) as numeric), 1) as max_z,
tsdb(>    round( CAST(min(z) as numeric), 1) as min_z
tsdb(> FROM
tsdb(>  stokes
tsdb(> WHERE land = 0
tsdb(> GROUP BY bucket, traj
tsdb(> ) x
tsdb-> WITH NO DATA;
ERROR:  invalid continuous aggregate view
TIP:  Include at least one aggregate function and a GROUP BY clause with time bucket.

Unnested query

That is why we need to unnest the query.

two views

Either by creating two views (so that one view is on top of the other).

View 1:

CREATE MATERIALIZED VIEW traj_daily_prep
WITH (timescaledb.continuous)
AS
SELECT
   time_bucket('1 day', time) as bucket,
   traj,
   count(obs) as count_obs,
   avg(MPA) as avg_mpa,
   max(MPA) as max_mpa,
   min(MPA) as min_mpa,
   round( CAST(avg(distance) as numeric), 1) as avg_distance,
   round( CAST(max(distance) as numeric), 1) as max_distance,
   round( CAST(min(distance) as numeric), 1) as min_distance,
   round( CAST(avg(lat) as numeric), 1) as avg_lat,
   round( CAST(max(lat) as numeric), 1) as max_lat,
   round( CAST(min(lat) as numeric), 1) as min_lat,
   round( CAST(avg(lon) as numeric), 1) as avg_lon,
   round( CAST(max(lon) as numeric), 1) as max_lon,
   round( CAST(min(lon) as numeric), 1) as min_lon,
   round( CAST(avg(temp) as numeric), 1) as avg_temp,
   round( CAST(max(temp) as numeric), 1) as max_temp,
   round( CAST(min(temp) as numeric), 1) as min_temp,
   round( CAST(avg(z) as numeric), 1) as avg_z,
   round( CAST(max(z) as numeric), 1) as max_z,
   round( CAST(min(z) as numeric), 1) as min_z
FROM
 stokes
WHERE land = 0
and traj < 10
GROUP BY bucket, traj
WITH NO DATA;

Out: CREATE MATERIALIZED VIEW

View 2: We do not need WITH (timescaledb.continuous) here since that was only needed in the traj_daily_prep view.

CREATE MATERIALIZED VIEW traj_daily_final
AS
SELECT *, 
round( CAST((x.max_distance - x.min_distance) / count_obs as numeric), 1) as avg_speed, 
max_distance - min_distance as diff_distance, 
max_lat - min_lat as diff_lat, 
max_lon - min_lon as diff_lon, 
max_temp - min_temp as diff_temp,
max_z - min_z as diff_z
FROM traj_daily_prep x
WITH NO DATA;

Out: CREATE MATERIALIZED VIEW

one view

Or by putting it altogether in one view, like here:

CREATE MATERIALIZED VIEW traj_daily
WITH (timescaledb.continuous)
AS
SELECT
   time_bucket('1 day', time) as bucket,
   traj,
   count(obs) as count_obs,
   avg(MPA) as avg_mpa,
   max(MPA) as max_mpa,
   min(MPA) as min_mpa,
   round( CAST(avg(distance) as numeric), 1) as avg_distance,
   round( CAST(max(distance) as numeric), 1) as max_distance,
   round( CAST(min(distance) as numeric), 1) as min_distance,
    round( CAST(max(distance) - min(distance) as numeric), 1) as diff_distance, 
    round( CAST((max(distance) - min(distance)) / count(obs) as numeric), 1) as avg_speed, 
   round( CAST(avg(lat) as numeric), 1) as avg_lat,
   round( CAST(max(lat) as numeric), 1) as max_lat,
   round( CAST(min(lat) as numeric), 1) as min_lat,
    round( CAST(max(lat) - min(lat) as numeric), 1) as diff_lat, 
   round( CAST(avg(lon) as numeric), 1) as avg_lon,
   round( CAST(max(lon) as numeric), 1) as max_lon,
   round( CAST(min(lon) as numeric), 1) as min_lon,
    round( CAST(max(lon) - min(lon) as numeric), 1) as diff_lon, 
   round( CAST(avg(temp) as numeric), 1) as avg_temp,
   round( CAST(max(temp) as numeric), 1) as max_temp,
   round( CAST(min(temp) as numeric), 1) as min_temp,
    round( CAST(max(temp) - min(temp) as numeric), 1) as diff_temp,
   round( CAST(avg(z) as numeric), 1) as avg_z,
   round( CAST(max(z) as numeric), 1) as max_z,
   round( CAST(min(z) as numeric), 1) as min_z,
    round( CAST(max(z) - min(z) as numeric), 1) as diff_z

FROM
 stokes
WHERE land = 0
and traj < 10
GROUP BY bucket, traj
WITH NO DATA;

Out: CREATE MATERIALIZED VIEW


If you want to drop a view, use drop materialized view traj_daily;

Out: DROP MATERIALIZED VIEW


Step 2: Populate the continuous aggregate

CALL refresh_continuous_aggregate('traj_daily','2010-01-01', '2021-01-01'); Out: CALL

SELECT * from traj_daily
WHERE bucket > '2009-01-01'
ORDER BY bucket ASC;

Out:

       bucket        | traj | count_obs |       avg_mpa       | max_mpa | min_mpa | avg_distance | max_distance | min_distance | diff_distance | avg_speed | avg_lat | max_lat | min_lat | diff_lat | avg_lon | max_lon | min_lon | diff_lon | avg_temp | max_temp | min_temp | diff_temp | avg_z | max_z | min_z | diff_z
---------------------+------+-----------+---------------------+---------+---------+--------------+--------------+--------------+---------------+-----------+---------+---------+---------+----------+---------+---------+---------+----------+----------+----------+----------+-----------+-------+-------+-------+--------
 2017-03-01 00:00:00 |    0 |        24 |                   1 |       1 |       1 |          6.9 |         11.6 |          0.0 |          11.6 |       0.5 |    43.3 |    43.3 |    43.3 |      0.0 |     5.3 |     5.3 |     5.2 |      0.1 |     13.5 |     13.5 |     13.4 |       0.0 |   1.0 |   1.0 |   1.0 |    0.0
 2017-03-02 00:00:00 |    0 |        13 |                   1 |       1 |       1 |         12.7 |         13.5 |         11.8 |           1.7 |       0.1 |    43.3 |    43.3 |    43.3 |      0.0 |     5.3 |     5.3 |     5.3 |      0.0 |     13.5 |     13.5 |     13.5 |       0.0 |   1.0 |   1.0 |   1.0 |    0.0
 2017-03-03 00:00:00 |    0 |         3 |                   1 |       1 |       1 |         20.2 |         21.0 |         19.5 |           1.5 |       0.5 |    43.3 |    43.3 |    43.3 |      0.0 |     5.3 |     5.3 |     5.3 |      0.0 |     13.5 |     13.5 |     13.5 |       0.0 |   1.0 |   1.0 |   1.0 |    0.0
 2017-03-05 00:00:00 |    1 |        24 |  0.2916666666666667 |       1 |       0 |         12.9 |         22.8 |          0.0 |          22.8 |       0.9 |    43.2 |    43.3 |    43.2 |      0.1 |     5.1 |     5.2 |     5.0 |      0.2 |     13.1 |     13.3 |     13.0 |       0.3 |   1.0 |   1.0 |   1.0 |    0.0
 2017-03-06 00:00:00 |    1 |        24 |  2.9166666666666665 |      10 |       0 |         32.4 |         46.0 |         23.5 |          22.5 |       0.9 |    43.1 |    43.1 |    43.1 |      0.1 |     5.3 |     5.4 |     5.2 |      0.2 |     13.6 |     13.7 |     13.4 |       0.4 |   1.0 |   1.0 |   1.0 |    0.0
 2017-03-07 00:00:00 |    1 |        24 |                   0 |       0 |       0 |         62.7 |         80.1 |         47.4 |          32.7 |       1.4 |    42.9 |    43.1 |    42.8 |      0.3 |     5.5 |     5.6 |     5.5 |      0.1 |     13.7 |     13.8 |     13.5 |       0.3 |   1.0 |   1.0 |   1.0 |    0.0
 2017-03-08 00:00:00 |    0 |         1 |                   1 |       1 |       1 |         33.3 |         33.3 |         33.3 |           0.0 |       0.0 |    43.3 |    43.3 |    43.3 |      0.0 |     5.3 |     5.3 |     5.3 |      0.0 |     13.4 |     13.4 |     13.4 |       0.0 |   1.0 |   1.0 |   1.0 |    0.0
 2017-03-08 00:00:00 |    1 |        24 |                   0 |       0 |       0 |         94.3 |        104.7 |         81.7 |          23.0 |       1.0 |    42.7 |    42.8 |    42.6 |      0.2 |     5.5 |     5.5 |     5.5 |      0.0 |     13.3 |     13.5 |     13.3 |       0.2 |   1.0 |   1.0 |   1.0 |    0.0
 2017-03-09 00:00:00 |    1 |        24 |                   0 |       0 |       0 |        117.0 |        129.0 |        105.6 |          23.4 |       1.0 |    42.5 |    42.6 |    42.4 |      0.2 |     5.5 |     5.6 |     5.5 |      0.1 |     13.3 |     13.3 |     13.3 |       0.0 |   1.0 |   1.0 |   1.0 |    0.0
 2017-03-10 00:00:00 |    1 |        24 |                   0 |       0 |       0 |        141.9 |        148.5 |        130.3 |          18.2 |       0.8 |    42.3 |    42.4 |    42.2 |      0.2 |     5.6 |     5.6 |     5.6 |      0.0 |     13.3 |     13.3 |     13.3 |       0.0 |   1.0 |   1.0 |   1.0 |    0.0
 2017-03-10 00:00:00 |    2 |        24 | 0.16666666666666666 |       1 |       0 |         13.0 |         21.1 |          0.0 |          21.1 |       0.9 |    43.2 |    43.3 |    43.1 |      0.2 |     5.0 |     5.0 |     5.0 |      0.0 |     13.5 |     13.5 |     13.4 |       0.1 |   1.0 |   1.0 |   1.0 |    0.0
 2017-03-11 00:00:00 |    1 |        24 |                   0 |       0 |       0 |        150.6 |        152.7 |        148.7 |           4.0 |       0.2 |    42.2 |    42.2 |    42.2 |      0.0 |     5.6 |     5.6 |     5.6 |      0.0 |     13.4 |     13.4 |     13.3 |       0.1 |   1.0 |   1.0 |   1.0 |    0.0
 2017-03-11 00:00:00 |    2 |        24 |                   0 |       0 |       0 |         24.5 |         28.2 |         21.4 |           6.8 |       0.3 |    43.1 |    43.1 |    43.1 |      0.0 |     5.0 |     5.0 |     4.9 |      0.1 |     13.5 |     13.5 |     13.4 |       0.1 |   1.0 |   1.0 |   1.0 |    0.0
 2017-03-12 00:00:00 |    1 |        24 |                   0 |       0 |       0 |        154.7 |        156.7 |        152.9 |           3.8 |       0.2 |    42.2 |    42.2 |    42.2 |      0.0 |     5.5 |     5.6 |     5.5 |      0.0 |     13.4 |     13.5 |     13.4 |       0.0 |   1.0 |   1.0 |   1.0 |    0.0
 2017-03-12 00:00:00 |    2 |        24 |                   0 |       0 |       0 |         29.3 |         30.7 |         28.3 |           2.4 |       0.1 |    43.1 |    43.1 |    43.1 |      0.0 |     4.9 |     4.9 |     4.9 |      0.0 |     13.5 |     13.5 |     13.4 |       0.1 |   1.0 |   1.0 |   1.0 |    0.0
 2017-03-13 00:00:00 |    1 |        24 |                   0 |       0 |       0 |        161.2 |        166.1 |        156.9 |           9.2 |       0.4 |    42.2 |    42.2 |    42.2 |      0.0 |     5.5 |     5.5 |     5.4 |      0.1 |     13.4 |     13.5 |     13.4 |       0.1 |   1.0 |   1.0 |   1.0 |    0.0
 2017-03-13 00:00:00 |    2 |        24 |                   0 |       0 |       0 |         33.3 |         35.7 |         30.9 |           4.7 |       0.2 |    43.1 |    43.1 |    43.1 |      0.0 |     5.0 |     5.0 |     4.9 |      0.0 |     13.6 |     13.6 |     13.5 |       0.1 |   1.0 |   1.0 |   1.0 |    0.0
 2017-03-14 00:00:00 |    1 |        24 |                   0 |       0 |       0 |        168.7 |        170.7 |        166.4 |           4.4 |       0.2 |    42.2 |    42.2 |    42.2 |      0.0 |     5.4 |     5.4 |     5.4 |      0.0 |     13.5 |     13.7 |     13.4 |       0.3 |   1.0 |   1.0 |   1.0 |    0.0
 2017-03-14 00:00:00 |    2 |        24 |                   0 |       0 |       0 |         39.5 |         42.7 |         35.9 |           6.8 |       0.3 |    43.0 |    43.1 |    43.0 |      0.1 |     5.0 |     5.0 |     5.0 |      0.0 |     13.7 |     13.8 |     13.6 |       0.2 |   1.0 |   1.0 |   1.0 |    0.0
 2017-03-14 00:00:00 |    3 |        24 |  0.3333333333333333 |       1 |       0 |          5.1 |          9.8 |          0.0 |           9.8 |       0.4 |    43.2 |    43.3 |    43.2 |      0.1 |     5.1 |     5.1 |     5.1 |      0.0 |     13.5 |     13.6 |     13.3 |       0.3 |   1.0 |   1.0 |   1.0 |    0.0
 2017-03-15 00:00:00 |    1 |        24 |                   0 |       0 |       0 |        172.4 |        173.8 |        170.9 |           3.0 |       0.1 |    42.1 |    42.2 |    42.1 |      0.0 |     5.3 |     5.3 |     5.3 |      0.0 |     13.8 |     13.8 |     13.7 |       0.2 |   1.0 |   1.0 |   1.0 |    0.0
 2017-03-15 00:00:00 |    2 |        24 |                   0 |       0 |       0 |         47.8 |         52.1 |         43.0 |           9.1 |       0.4 |    43.0 |    43.0 |    42.9 |      0.1 |     5.0 |     5.0 |     4.9 |      0.1 |     13.9 |     13.9 |     13.8 |       0.1 |   1.0 |   1.0 |   1.0 |    0.0
 2017-03-15 00:00:00 |    3 |        24 |                   0 |       0 |       0 |         14.7 |         17.7 |         10.3 |           7.4 |       0.3 |    43.2 |    43.2 |    43.1 |      0.1 |     5.0 |     5.1 |     5.0 |      0.1 |     13.7 |     13.8 |     13.6 |       0.2 |   1.0 |   1.0 |   1.0 |    0.0
 2017-03-16 00:00:00 |    1 |        24 |                   0 |       0 |       0 |        174.8 |        176.0 |        174.0 |           2.0 |       0.1 |    42.1 |    42.1 |    42.1 |      0.0 |     5.3 |     5.3 |     5.3 |      0.0 |     13.9 |     14.0 |     13.8 |       0.1 |   1.0 |   1.0 |   1.0 |    0.0
 2017-03-16 00:00:00 |    2 |        24 |                   0 |       0 |       0 |         57.0 |         61.1 |         52.5 |           8.6 |       0.4 |    42.9 |    42.9 |    42.9 |      0.1 |     4.9 |     4.9 |     4.9 |      0.1 |     14.0 |     14.0 |     13.9 |       0.1 |   1.0 |   1.0 |   1.0 |    0.0
 2017-03-16 00:00:00 |    3 |        24 |                   0 |       0 |       0 |         19.7 |         21.2 |         17.9 |           3.3 |       0.1 |    43.1 |    43.1 |    43.1 |      0.0 |     5.0 |     5.0 |     5.0 |      0.0 |     13.9 |     13.9 |     13.8 |       0.1 |   1.0 |   1.0 |   1.0 |    0.0
 2017-03-17 00:00:00 |    1 |        24 |                   0 |       0 |       0 |        180.6 |        187.1 |        176.2 |          10.9 |       0.5 |    42.1 |    42.1 |    42.1 |      0.0 |     5.4 |     5.5 |     5.3 |      0.1 |     13.9 |     14.0 |     13.7 |       0.2 |   1.0 |   1.0 |   1.0 |    0.0
 2017-03-17 00:00:00 |    2 |        24 |                   0 |       0 |       0 |         67.0 |         75.4 |         61.4 |          14.0 |       0.6 |    42.8 |    42.9 |    42.7 |      0.1 |     4.8 |     4.9 |     4.8 |      0.0 |     14.1 |     14.1 |     14.1 |       0.1 |   1.0 |   1.0 |   1.0 |    0.0
 2017-03-17 00:00:00 |    3 |        24 |                   0 |       0 |       0 |         26.0 |         32.7 |         21.4 |          11.3 |       0.5 |    43.1 |    43.1 |    43.0 |      0.1 |     5.0 |     5.1 |     5.0 |      0.0 |     14.0 |     14.1 |     13.9 |       0.2 |   1.0 |   1.0 |   1.0 |    0.0
...
lorenzznerol commented 3 years ago

@siradam @ShafiqShams @md-Junaid

Whoever needs pre-calculated aggregations, please demand this here. You can ask on any level, be it MPA, traj, and so on. You need to say whether the time period needs to be part of the aggregation or not. You can also just do this on your own and add such materialized views to the database yourself, if you like. There is an example in the comment above.

I will also try to add clusters, but up to now, I was not able to get plpython3u extension to run on TimescaleDB. We can expect that this will be possible soon.

There are workarounds to cluster the data just in postgreSQL as well, at least for kmeans. I might be able to add that to the materialized view. The aim is to add all the needed columns to that materialized view called "traj_daily" so that everyone can use it for visualization and analysis. The query speed will be the fastest possible since the results are pre-calculated in a materialized view.

If you want to connect to the tables and views of the database, follow the guide of the TimescaleDB webserver at https://github.com/siradam/DataMining_Project/issues/38#issuecomment-888197130

Search for: "How to really connect (with PostgreSQL + psql installed)" There you get the login data to get access to the database. You need to have PostgreSQL installed (which automatically installs the needed psql).

lorenzznerol commented 3 years ago

Using Extensions in TimescaleDB

plpython3u

Support ticket at TimescaleDB Forge open now that asks for the plpython3u extension:

Question:

I wanted to try the extension plpython3u to do some dbscan or DTW clustering for a university seminar that is a time-series of geospatial data (lon / lat). I do not find plpython3u in the list. I would probably go for postgis extensions instead, then.

Answer:

You understood it correctly. "Untrusted" language extensions such as plpythonu cannot be supported as they would compromise our ability to guarantee the highest possible service level.

Thus, there is no plpython3u extension on the Web Server.

Available extensions

At least we know now which extensions are available on TimescaleDB. Anyone who is interested can have a closer look at the extensions and how we might use them.

tsdb=> select pae.* from current_setting('extwlist.extensions') AS cs(e) cross join regexp_split_to_table(e, ',') as ext(allowed) join pg_available_extensions as pae on (allowed=name) order by 1;

Out:

          name          | default_version | installed_version |                               comment
------------------------+-----------------+-------------------+----------------------------------------------------------------------
 bloom                  | 1.0             |                   | bloom access method - signature file based index
 btree_gin              | 1.3             |                   | support for indexing common datatypes in GIN
 btree_gist             | 1.5             |                   | support for indexing common datatypes in GiST
 citext                 | 1.6             |                   | data type for case-insensitive character strings
 cube                   | 1.4             |                   | data type for multidimensional cubes
 dict_int               | 1.0             |                   | text search dictionary template for integers
 dict_xsyn              | 1.0             |                   | text search dictionary template for extended synonym processing
 fuzzystrmatch          | 1.1             |                   | determine similarities and distance between strings
 hstore                 | 1.6             |                   | data type for storing sets of (key, value) pairs
 intarray               | 1.2             |                   | functions, operators, and index support for 1-D arrays of integers
 isn                    | 1.2             |                   | data types for international product numbering standards
 lo                     | 1.1             |                   | Large Object maintenance
 ltree                  | 1.1             |                   | data type for hierarchical tree-like structures
 pg_stat_statements     | 1.7             | 1.7               | track execution statistics of all SQL statements executed
 pg_trgm                | 1.4             |                   | text similarity measurement and index searching based on trigrams
 pgcrypto               | 1.3             |                   | cryptographic functions
 pgrouting              | 3.1.3           |                   | pgRouting Extension
 postgis                | 3.1.2           |                   | PostGIS geometry and geography spatial types and functions
 postgis_raster         | 3.1.2           |                   | PostGIS raster types and functions
 postgis_sfcgal         | 3.1.2           |                   | PostGIS SFCGAL functions
 postgis_tiger_geocoder | 3.1.2           |                   | PostGIS tiger geocoder and reverse geocoder
 postgis_topology       | 3.1.2           |                   | PostGIS topology spatial types and functions
 postgres_fdw           | 1.0             |                   | foreign-data wrapper for remote PostgreSQL servers
 promscale              | 0.2.0           |                   | Promscale support functions
 seg                    | 1.3             |                   | data type for representing line segments or floating-point intervals
 tablefunc              | 1.0             |                   | functions that manipulate whole tables, including crosstab
 tcn                    | 1.0             |                   | Triggered change notifications
 timescale_analytics    | 0.3             |                   | timescale_analytics
 timescaledb            | 2.3.1           | 2.3.1             | Enables scalable inserts and complex queries for time-series data
 timescaledb_toolkit    | 1.0             | 1.0               | timescaledb_toolkit
 tsm_system_rows        | 1.0             |                   | TABLESAMPLE method which accepts number of rows as a limit
 tsm_system_time        | 1.0             |                   | TABLESAMPLE method which accepts time in milliseconds as a limit
 unaccent               | 1.1             |                   | text search dictionary that removes accents
 uuid-ossp              | 1.1             |                   | generate universally unique identifiers (UUIDs)
(34 Zeilen)
lorenzznerol commented 3 years ago

Install Python packages in the extension

I tried to get a more recent version of the Docker container of timescaleDB-PostgreSQL above, but it does not work with timescaledb:latest-pg11 till timescaledb:latest-pg13, since the Dockerfile throws the error:

postgresql-plpython3 (no such package): required by: .plpython3-deps-20210819.182405 [postgresql-plpython3]

It seems as if only PostgreSQL 10 gives you the chance to use plpython with that timescaleDB container.

To check whether timescale on Docker is the only way to go, I installed PostgreSQL 10 on Windows with the official EDB installer and on top of that, using Stack Builder to add pl/python to the EDB installation, which is recommended. When this still did not work, I copied the pyhton37.dll into the Windows\System32 directory as a known trick, but it still did not work. That is why it seems that this timescaleDB-PostgreSQL container is perhaps the only configuration on the net with a working pl/python extension. It was pure luck to find out about the timescaleDB-PostgreSQL container, that is why this is disappointing PostgreSQL service. Funny enough, timescaleDB does not offer plpython on the web server which could mean that plpython is just too insecure to be allowed on production systems and therefore was ignored by the recent developments.

That is why I must take the 2016 PostgreSQL 10 on a timescaleDB only to test plpython. Very strange, and likely not the best way to go. It all rather hints at Spark to replace PostgreSQL, as planned in issue #17.


After a full day invested into installing additional basic packages like pandas in the Alpine Docker container of PostgreSQL10 and timescale 0.9.0 (FROM timescale/timescaledb:0.9.0-pg10) I had to find out that apk (Alpine form of apt) does not support (well enough or not at all) basic packages in exactly this old Python 3.6 version, see Installing pandas in docker Alpine. I have installed Poetry to get the dependencies right, but it did not work.


After having tried getting Python extension and the packages to run on Windows (SEE UPDATE BELOW THAT WINDOWS IS INDEED POSSIBLE AS WELL) and on an outdated Alpine Docker container, I have now eventually succeeded in installing PostgreSQL and plpython3u on Linux (WSL2).

!!! UPDATE, 5 months after the problem had arised: Windows does support plpython3u !!! You just need to install Python v3.7.0 and not v3.7.9 as I did.

There is an official guide for Linux installations at PostgreSQL Downloads and PostgreSQL Wiki.

This is how to install it:

# Create the file repository configuration:
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'

# Import the repository signing key:
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -

# Update the package lists:
sudo apt-get update

# Install the latest version of PostgreSQL.
# If you want a specific version, use 'postgresql-12' or similar instead of 'postgresql':
sudo apt-get -y install postgresql

After this, install the extension, but check before whether you have two postgresql versions installed by running

service postgresql start

If you see two or more, as I had versions 12 and 13, consider either deleting the unneeded or changing the config and settings by following this link.

Install the plpython3u extension following PostgreSQL: how to install plpythonu extension:

sudo apt-cache search ".*plpython3.*"
sudo apt-get install postgresql-contrib postgresql-plpython3-13

Now change to the postgres user:

sudo su postgres

If you have two postgresql versions installed, you need to run psql with the right port. If your 13 version is at port 5433, run

psql --5433

If you only have one version installed, run

psql

After this, follow the typical testing of plpython3u by creating the return_version() function of above and checking its results.


Half a day invested into getting an import of Python packages done for a plpython3u stored procedure. No way up to now. The installation of pandas in Python 3.8.2 has no effect on the Python version 3.8.10 reported by PostgreSQL, the kmeans test function still asks for pandas. I did not understand how to use the solution of “Module not found” when importing a Python package within a plpython3u procedure.


This is done now. The easy mistake I made was to install the packages without sudo in front. We are now able to use the full range of python in stored procedures on database level.


Next steps are at #29.