Closed dschowta closed 5 years ago
Hi, can you send more details? Like software version, amount of records in datastreams/observation, observation type used and the slow sample query.
Total number of observations: 6000000+
the number of Datastreams: 77
Number of observations in a queried datastream: 53000+
Version: Gost V 0.6, gost-db v 0.5 (both latest from Docker hub)
The query I used are:
GET http://<url>/v1.0/Datastreams(46)/observations
and
GET http://<url>/v1.0/observations
average size of observation in Datastrem no. 46: 117KB (Mostly multimedia data) I used docker compose from the GOST. With Changed GOST_SERVER_MAX_ENTITIES to 100
Having the docker latest version does not ensure you have the latest version, this is an automated build triggered on a Github commit. Seems like a fix for faster queries was commited on 20 march of 2018 https://github.com/gost/gost-db/commit/1b51d975bd99c7b85e7a047ba26a9677f2877aaf and https://github.com/gost/server/commit/1a229c7beefb38e85f054af1bd4e9e96df8629ea Maybe you can check the docker image creation date ($docker images)
I created a 60 GB database to try to recreate your issue.
browser time | raw db query | request |
---|---|---|
574 ms | 202 ms | /Datastreams(1)/Observations |
590 ms | 223 ms | /Datastreams?$filter=id eq 1&$expand=Observations |
20 ms | 13 ms | /Datastreams(2)/Observations |
18 ms | 13 ms | /Datastreams?$filter=id eq 2&$expand=Observations |
---- ms | --- ms | /Observations |
Browser times are without data transfer time and only wait time until a response is send back, raw db query is without result transfer to gost-server, only the time it takes the database to fully execute the query.
I didn't experience your 6-7 seconds delay for the given request http://
Also requesting top 100 observations on this database (
Thank you. Requesting top 100 observations on this database (/v1.0/Observations) gave a timeout for me too. Here are the information about my docker images: gost-db image created on : 2018-03-20 gost image created on : 2018-05-17
I agree to your point that it was network and browser which took more time because when I made a GET query using curl I could see a 2.5 seconds of difference between the request from a localhost (host containing docker containers) and a remote host. Moreover, the browser took time for rendering too. Hence, now I can confirm from my side following: The curl request from localhost (inspired from this blog post): Querying data stream with 53000+ observations:
$ curl -w "@curl-format" -o outfile2 -s "/v1.0/Datastreams(17)/Observations" time_namelookup: 0.004 time_connect: 0.005 time_appconnect: 0.000 time_pretransfer: 0.005 time_redirect: 0.000 time_starttransfer: 2.543
time_total: 2.620
Querying datastream with 26000+ observations:
$ curl -w "@curl-format" -o outfile2 -s "/v1.0/Datastreams(46)/Observations" time_namelookup: 0.004 time_connect: 0.005 time_appconnect: 0.000 time_pretransfer: 0.005 time_redirect: 0.000 time_starttransfer: 1.688
time_total: 1.754
Regarding your query about the biggest observation and raw query times, I am not aware of how to do it. I will try to find it out gradually. If you have a quicker way, then it would save my time.
Thank you. PS: I edited to accomodate two datastreams of different number of observations
I just observed that fetching the FeatureOfInterest corresponding to an observation is also slower. eg : /v1.0/Observations(6329953)/FeatureOfInterest takes approximately 5+ seconds.
Same here, using latest
Docker version (march 23, 2018) gave timeouts for queries on the complete Observations
collection (8 million records in DB), for example:
Getting all Observations
within time-period:
/v1.0/Observations?$filter=phenomenonTime gt ‘2018-01-29T11:00:00.000Z’ and phenomenonTime le ‘2018-01-29T12:00:00.000Z’&$select=result,phenomenonTime,parameters
or last 100 observations:
/v1.0/Observations?$top=100
Getting the last Observations
for all Things
(about 100) and their Locations
and Datastreams
(2-10) goes quite fast:
/v1.0/Things?$select=id,name,properties&$expand=Locations($select=location),Datastreams($select=id,name),Datastreams/Observations($select=id,phenomenonTime,result;$top=1)
Just a thought:
Time-related attributes (phenomenonTime
, resultTime
) of Observation
are not indexed within the database observation
table. Though most Observation
entity-attributes are encoded within a jsonb
structure, it would still be possible to create indexes for jsonb-fields, see e.g. here and here. But "resultTime"
and "phenomenonTime"
JSON-fields are defined as strings...
.. Or add extra PG timestamp
fields to observation
table for indexing and querying.
(I now see this was already discussed in gost/gost-db#3)
Also the Observations id
key-attribute does not seem to have a dedicated (Primary Key) index.
Request timeout on all Observations should be fixed by adding an index to observation.id: https://github.com/gost/gost-db/commit/4b2c063c0f595eabcafb84be3967960e4e9b3591, I thought I had some problems with the query planner before with indexes on the observation id (expand queries extremely slow) but it seems to work just fine. The reason for not having indexes on time related fields in observation is because it will slow down the speed of inserting observations, we have to find some middle ground between the post and get speed. Let's see if we can use timescaleDB https://github.com/gost/gost-db/issues/4
gost/gost-db@4b2c063 solves the delay
Hi GOST team, I am running GOST with postgresql. The observations in my deployment include camera data as well. As the number of observations grows, it is getting slower, especially when I try to get observations related to a datastream. The delay is approximately 6 to 7 seconds.
Therefore, I need a suggestion, how can I improve the performance of GOST queries for observations? Please let me know if you want more details
Best, Shreekantha