nlesc-sigs / data-sig

Linked data, data & modeling SIG
Other
5 stars 3 forks source link

data lake storage architecture for AutoGraph #12

Closed bakhshir closed 4 years ago

bakhshir commented 6 years ago

What is our final goal?

To provide an open platform to access and analyse traffic data, a webservice for internal and external users.

What is the challenge?

The PI wants an architecture for his traffic "data lake", which includes traffic, transport, weather, logistics data. With the traffic data alone, it is a huge DB (around 50TB and expected to grow), currently run on the same machine used for everything at the lab. We are looking into changing this from the current PostgreSQL DB and there are two options that we see:

What are the use cases for the database (extreme and normal)?

  1. typical data query for roads (e.g. historical traffic analysis or last 24 hour traffic detections)
    • select a portion of the network within some period of time and to do the statistics
    • the main data consists of speed in 2 dimensions: location along the road and time
    • additional data may be required, such as weather, roadwork etc.
  2. capacity estimation
    • query as many possible occasions of congestion as possible
    • low-speed queries over time
    • this is an aggregation workflow
  3. building 3D traffic models
    • build dimensions over time
    • a lot of analytics and data queries over the network

The datasets are from different sources:

Which technologies you are using to store and access the data:

The current interface is using MATLAB, Java (incl OpenLR library, Modelit Matlab Webservice Toolbox), and PostgreSQL for the DB.

We would appreciate your suggestions on this.

c-martinez commented 6 years ago

@bakhshir Can you share the schema of the DB and the queries they usually run?

romulogoncalves commented 6 years ago

@bakhshir It seems SURFsara provided consultancy on the topic, would be ok to share their advise with us? Like this we could use such knowledge for similar projects in the future.

c-martinez commented 4 years ago

@bakhshir -- This issue is quite old by now. I will close it, but please open it again if there is something you would like to share (for example lessons learned) with the SIG on this topic.