spiegel-data / 2019-02-tempolimit

Zusammenhang zwischen Tempolimit und Autobahnunfällen
12 stars 0 forks source link

Performance of "Unfalldaten" #3

Closed nuntius35 closed 5 years ago

nuntius35 commented 5 years ago

In 00_Datenaufbereitung.Rmd the chunk to join the accidents to the roads seems to be wasteful. You are computing the distance of every accident to its nearest motorway and then drop all accidents which are far away from motorways. This step takes quite some time.

I found that the following code was much faster (replacing your lines 63 to 78):

unfaelle_2017 = st_read("data/raw/Shapefile/Unfallorte2017_LinRef.shp", quiet = T) %>% st_set_crs(25832) %>% st_zm(.)

# Compute length of each section
data_OSM_filtered = data_OSM_filtered %>% 
  st_transform(25832) %>%
  mutate(length = as.double(st_length(.)))

# Filter accidents which are within 10 meters of a motorway
unfall_filter = (lengths(st_intersects(st_buffer(unfaelle_2017, 10), data_OSM_filtered)) > 0)
unfaelle_2017_autobahn = unfaelle_2017[unfall_filter,]

# Add the nearest motorway to each accident
unfaelle_2017_autobahn = unfaelle_2017_autobahn %>% 
  mutate(nearest_autobahn_id = st_nearest_feature(., data_OSM_filtered))
PatrickStotz commented 5 years ago

I totally see you're point. Sounds like this is improving performance a lot. But the main purpose of this repository is transparency and intelligibly. The proposed code is harder to understand and since I'm not a hundret percent familiar with all steps you're performing, I can't guarantee its accuracy. I thus decide to keep it simple and slow instead of more complex and faster. But thanks for the proposition. Much appreciated!