liyaguang / DCRNN

Implementation of Diffusion Convolutional Recurrent Neural Network in Tensorflow
MIT License
1.19k stars 392 forks source link

How could I get METR-LA dataset? #10

Closed louisgry closed 5 years ago

liyaguang commented 5 years ago

Hi Louis, you may get the dataset following instructions in the README.

liyaguang commented 5 years ago

Hi, the URL in the README is actually the raw data. You can generate the processed training data using python -m scripts.generate_training_data --output_dir=data/METR-LA

On Sun, Oct 7, 2018 at 12:51 AM Louis notifications@github.com wrote:

Thanks. But what I mean is that I want to get rawdata. Is it available?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/liyaguang/DCRNN/issues/10#issuecomment-427633488, or mute the thread https://github.com/notifications/unsubscribe-auth/AFSHIWkINGeWmLXs3fyktBStbctv5r4cks5uibJkgaJpZM4XLs53 .

EllenZYQ commented 5 years ago

hi,where can I get an introduction to the dataset?

petrhrobar commented 2 years ago

Hi, where can I find information on how distances_la_2012.csv file is created? Are those distances between every single sensor with every single other one?

EllenZYQ commented 2 years ago

这是来自QQ邮箱的自动回复邮件。   邮件已收到。

lemonliu1992 commented 2 years ago

https://towardsdatascience.com/build-your-first-graph-neural-network-model-to-predict-traffic-speed-in-20-minutes-b593f8f838e5

ThomasAFink commented 2 years ago

This could be helpful to convert to csv and read the h5 files. 207 detectors. Speed in 5min intervals

import pandas as pd
import h5py

#h5 file path
filename = 'metr-la.h5'

#read h5 file
dataset = h5py.File(filename, 'r')

#print the first unknown key in the h5 file
print(dataset.keys()) #returns df

#save the h5 file to csv using the first key df
with pd.HDFStore(filename, 'r') as d:
    df = d.get('df')
    df.to_csv('metr-la.csv')
EllenZYQ commented 2 years ago

这是来自QQ邮箱的自动回复邮件。   邮件已收到。

ThomasAFink commented 2 years ago

If you're using the numpy arrays in the npy example: https://pytorch-geometric-temporal.readthedocs.io/en/latest/_modules/torch_geometric_temporal/dataset/metr_la.html. The first array (node_values.npy) is simply the speed values from the mtr-la.h5 file.

The second array is the adjacency matrix (adj_mat.npy) as discussed in the article. It's created from the graph_sensor_ids.txt and distances_la_2012.csv. Then it's somehow flattened which I'm still trying to figure out.

import numpy as np
from pathlib import Path

try:
    import common
    DATA = common.dataDirectory()
except ImportError:
    DATA = Path().resolve() / 'data'

#Speeds from mtr-la.h5 https://drive.google.com/drive/folders/10FOTa6HXPqX8Pf5WRoRwcFnW9BrNZEIX view convert h5_to_csv.py
#npy formated examples from https://graphmining.ai/temporal_datasets/METR-LA.zip https://graphmining.ai/temporal_datasets/
NODE_VALUES   = DATA / 'traffic_data' / 'METR-LA' /  'node_values.npy'

#view npy data
data = np.load(NODE_VALUES)

np.set_printoptions(suppress=True)
print(data[0].tolist())
print(len(data[0]))
print(len(data))

#adjacency Matrix distances flattened?
#npy formated examples from https://graphmining.ai/temporal_datasets/METR-LA.zip https://graphmining.ai/temporal_datasets/
ADJENCY_MATRIX   = DATA / 'traffic_data' / 'METR-LA' /  'adj_mat.npy'

#view npy data
data = np.load(ADJENCY_MATRIX)
with np.printoptions(threshold=np.inf):
    print(data[0].tolist())
    print(len(data[0]))
    print(len(data))
Screenshot 2022-05-07 at 01 20 25
ThomasAFink commented 2 years ago

If you're using the numpy arrays in the npy example: https://pytorch-geometric-temporal.readthedocs.io/en/latest/_modules/torch_geometric_temporal/dataset/metr_la.html. The first array (node_values.npy) is simply the speed values from the mtr-la.h5 file.

The second array is the adjacency matrix (adj_mat.npy) as discussed in the article. It's created from the graph_sensor_ids.txt and distances_la_2012.csv. Then it's somehow flattened which I'm still trying to figure out.

import numpy as np
from pathlib import Path

try:
    import common
    DATA = common.dataDirectory()
except ImportError:
    DATA = Path().resolve() / 'data'

#Speeds from mtr-la.h5 https://drive.google.com/drive/folders/10FOTa6HXPqX8Pf5WRoRwcFnW9BrNZEIX view convert h5_to_csv.py
#npy formated examples from https://graphmining.ai/temporal_datasets/METR-LA.zip https://graphmining.ai/temporal_datasets/
NODE_VALUES   = DATA / 'traffic_data' / 'METR-LA' /  'node_values.npy'

#view npy data
data = np.load(NODE_VALUES)

np.set_printoptions(suppress=True)
print(data[0].tolist())
print(len(data[0]))
print(len(data))

#adjacency Matrix distances flattened?
#npy formated examples from https://graphmining.ai/temporal_datasets/METR-LA.zip https://graphmining.ai/temporal_datasets/
ADJENCY_MATRIX   = DATA / 'traffic_data' / 'METR-LA' /  'adj_mat.npy'

#view npy data
data = np.load(ADJENCY_MATRIX)
with np.printoptions(threshold=np.inf):
    print(data[0].tolist())
    print(len(data[0]))
    print(len(data))
Screenshot 2022-05-07 at 01 20 25

This PowerPoint has something to do with that: https://www.slideshare.net/chirantanGupta1/traffic-prediction-from-street-network-imagespptx

ThomasAFink commented 2 years ago

This could be helpful to convert to csv and read the h5 files. 207 detectors. Speed in 5min intervals

import pandas as pd
import h5py

#h5 file path
filename = 'metr-la.h5'

#read h5 file
dataset = h5py.File(filename, 'r')

#print the first unknown key in the h5 file
print(dataset.keys()) #returns df

#save the h5 file to csv using the first key df
with pd.HDFStore(filename, 'r') as d:
    df = d.get('df')
    df.to_csv('metr-la.csv')

Speeds in metr-la.h5 file match the numpy array.

Screenshot 2022-05-06 at 22 30 45
ThomasAFink commented 2 years ago

If you're using the numpy arrays in the npy example: https://pytorch-geometric-temporal.readthedocs.io/en/latest/_modules/torch_geometric_temporal/dataset/metr_la.html. The first array (node_values.npy) is simply the speed values from the mtr-la.h5 file. The second array is the adjacency matrix (adj_mat.npy) as discussed in the article. It's created from the graph_sensor_ids.txt and distances_la_2012.csv. Then it's somehow flattened which I'm still trying to figure out.

import numpy as np
from pathlib import Path

try:
    import common
    DATA = common.dataDirectory()
except ImportError:
    DATA = Path().resolve() / 'data'

#Speeds from mtr-la.h5 https://drive.google.com/drive/folders/10FOTa6HXPqX8Pf5WRoRwcFnW9BrNZEIX view convert h5_to_csv.py
#npy formated examples from https://graphmining.ai/temporal_datasets/METR-LA.zip https://graphmining.ai/temporal_datasets/
NODE_VALUES   = DATA / 'traffic_data' / 'METR-LA' /  'node_values.npy'

#view npy data
data = np.load(NODE_VALUES)

np.set_printoptions(suppress=True)
print(data[0].tolist())
print(len(data[0]))
print(len(data))

#adjacency Matrix distances flattened?
#npy formated examples from https://graphmining.ai/temporal_datasets/METR-LA.zip https://graphmining.ai/temporal_datasets/
ADJENCY_MATRIX   = DATA / 'traffic_data' / 'METR-LA' /  'adj_mat.npy'

#view npy data
data = np.load(ADJENCY_MATRIX)
with np.printoptions(threshold=np.inf):
    print(data[0].tolist())
    print(len(data[0]))
    print(len(data))
Screenshot 2022-05-07 at 01 20 25

This PowerPoint has something to do with that: https://www.slideshare.net/chirantanGupta1/traffic-prediction-from-street-network-imagespptx

And this repository is also related to the adjacency matrix: https://github.com/FelixOpolka/STGCN-PyTorch

ThomasAFink commented 2 years ago

Hi, where can I find information on how distances_la_2012.csv file is created? Are those distances between every single sensor with every single other one?

Probably just the distances between all two points put in matrix shape (207 Detectors X 207 Detectors), but it's related to the adjacency matrix found in: adj_mat.npy

https://stackoverflow.com/questions/19412462/getting-distance-between-two-points-based-on-latitude-longitude

from geopy.distance import geodesic

origin = (30.172705, 31.526725)  # (latitude, longitude) don't confuse
dist = (30.288281, 31.732326)

print(geodesic(origin, dist).meters)  # 23576.805481751613
print(geodesic(origin, dist).kilometers)  # 23.576805481751613
print(geodesic(origin, dist).miles)  # 14.64994773134371 
ThomasAFink commented 2 years ago

For each detector node the nearest 12 detectors nodes are added into the adjacency matrix, the rest are filled with 0s. 1 is always the weight path to the current detector node in the list. Could maybe also use the k-nearest neighbor algorithm.? Idk.

ThomasAFink commented 2 years ago

Trying to reconstruct this example here because I have my own data from a different city: https://colab.research.google.com/drive/132hNQ0voOtTVk3I4scbD3lgmPTQub0KR?usp=sharing

Video: https://www.youtube.com/watch?v=Rws9mf1aWUs

ThomasAFink commented 2 years ago

Hi, where can I find information on how distances_la_2012.csv file is created? Are those distances between every single sensor with every single other one?

Dijkstra finds the optimal route between two detectors. https://github.com/liyaguang/DCRNN/issues/8#issuecomment-424170421

ThomasAFink commented 2 years ago

Hi, where can I find information on how distances_la_2012.csv file is created? Are those distances between every single sensor with every single other one?

Dijkstra finds the optimal route between two detectors. #8 (comment)

An example with Dijkstra using OpenStreetMap. Output slightly different than with Google Maps. https://medium.com/p/2d97d4881996

import osmnx as ox
import networkx as nx

# define the start and end locations in latlng
ox.config(log_console=True, use_cache=True)
start_latlng = (34.14745, -118.37124) #Detector 717490
# location to where you want to find your route
end_latlng = (34.15497, -118.31829) #Detector 773869

# find shortest route based on the mode of travel
place     = 'Los Angeles, California, United States'

# 'drive', 'bike', 'walk'# find shortest path based on distance or time
mode      = 'drive'

# 'length','time'# create graph from OSM within the boundaries of some       
optimizer = 'time' 

# geocodable place(s)
graph = ox.graph_from_place(place, network_type = mode)

# find the nearest node to the start location
orig_node = ox.get_nearest_node(graph, start_latlng)

# find the nearest node to the end location
dest_node = ox.get_nearest_node(graph, end_latlng)

#  find the shortest path
shortest_route = nx.shortest_path(graph,
                                  orig_node,
                                  dest_node,
                                  weight=optimizer)

#find the shortest path method dijkstra or bellman-ford
shortest_route_distance = nx.shortest_path_length(graph, orig_node,dest_node,weight="length", method="dijkstra")

#distance between 717490 and 773869 with OpenStreetMap is 8252.298 and the original value in the dataset was 7647.0
print("Distance: " + str(shortest_route_distance))
EllenZYQ commented 1 year ago

这是来自QQ邮箱的自动回复邮件。   邮件已收到。

ThomasAFink commented 1 year ago

Hi, where can I find information on how distances_la_2012.csv file is created? Are those distances between every single sensor with every single other one?

Here's how I created my own distance matrix: https://github.com/ThomasAFink/osmnx_adjacency_matrix_for_graph_convolutional_networks

EllenZYQ commented 1 year ago

这是来自QQ邮箱的自动回复邮件。   邮件已收到。

EllenZYQ commented 1 year ago

这是来自QQ邮箱的自动回复邮件。   邮件已收到。

Kqingzheng commented 1 year ago

I have already found the data I need in other Repositories. Sorry to bother you. Thank you for your timely attention and wonderful work

---Original--- From: @.> Date: Wed, Jul 12, 2023 16:13 PM To: @.>; Cc: @.**@.>; Subject: Re: [liyaguang/DCRNN] How could I get METR-LA dataset? (#10)

这是来自QQ邮箱的自动回复邮件。   邮件已收到。 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>