udsleeds / openinfra

Open access data for transport research: tools, modelling and simulation
https://udsleeds.github.io/openinfra/
Other
31 stars 4 forks source link

osmextract oe_get_network cycling filtering of access="no" #81

Open hulsiejames opened 2 years ago

hulsiejames commented 2 years ago

Please see the script contained within details which replicates this analysis!

Hi @Robinlovelace @Agila5 - I have a question on the use of access="no" and its filtering within osmextract.

Specifically, if a feature is marked as access="no", but with the addition of bicycle="yes" (also, foot="yes") should this be included in osmextracts (and others) final walking/cycling network? Full context is discussed below, but see the exaples at the bottom for a quick visualisation.

I've been working on a function that defines a cycling network mimicing the filtering of the the oe_get_netwrok function network = oe_get_network(mode="cycling") SQL query.

I have found 20 features that maybe could be included in the definition of a cycling network - but due to access="no" these are excluded from the osmextract network despite containing bicycle="yes" & foot="yes" tags.

To find these 20 features I dowloaded some network data for Leeds with oe_get(place="Leeds") - I then mimicked the filtering of oe_get_network(mode=cycling) within my function, oi_active_cycle, and comapred the two outputs which theoretically should be the same, however my function has detected these 20 additional features, stored in the variable function_output_not_in_osmextract_cycle see full script in details

Using this data, I have created a network identical to the osmextract::oe_get_network(mode = "cycling") by applying the following manual filtering from documentation:

osmextract_cycle_manual = data %>%
  # higway IS NOT null
  filter(! is.na(highway)) %>%
  # highway NOT IN inappropriate ways
  filter(! highway %in% c('abandoned', 'bus_guideway', 'byway', 'construction', 'corridor', 'elevator',
                          'fixme', 'escalator', 'gallop', 'historic', 'no', 'planned', 'platform',
                          'proposed', 'raceway', 'steps')
  ) %>%
  # higway NOT IN anappropriate ways UNLESS bicycle is okay
  filter(! highway %in% c('motorway', 'motorway_link', 'footway', 'bridleway','pedestrian') | bicycle %in% c('yes', 'designated', 
  'permissive', 'destination')
  ) %>%
  # access cannot be private or disallowed
  filter(! access %in% c('private', 'no')) %>%
  # remove ways not allowed for cyclists
  filter(! bicycle %in% c('private', 'no', 'use_sidepath', 'restricted')) %>%
  # remove ways that contain string "private" in service tag
  filter(! grepl("private", service))

Next, using my new function, I go through the data and create a new column, oi_active_cycle, for each way in the network, if this way would be in the osmextrct cycling network, oi_active_travel will be "yes". This filtering is done using dplyr::case_when()

Note that below, osm_sf will be data defined above:

oi_active_cycle = function(osm_sf, remove=FALSE){
  #browser() # Uncomment to debug
  osm_sf_cycle = osm_sf %>% dplyr::mutate(oi_cycle = dplyr::case_when(

    # Highway cannot be NA
    is.na(highway) ~ "no",

    #Create two case_when cases:
    # 1 - if highway = "bad highways" BUT bicycle = "good bicyele" then assign ~ "yes",
    # 2 - assign highway = "bad highways" ONLY as ~ "no". Ways that are appropriate for cyclists will already be "yes" from above
    # 1
    (highway %in% c('motorway', 'motorway_link', 'footway', 'bridleway','pedestrian') & bicycle %in% c('yes', 'designated', 'permissive', 'destination')) ~ "yes",
    # 2
    highway %in% c('motorway', 'motorway_link', 'footway', 'bridleway','pedestrian') ~ "no",

    # Highway values not appropriate for cycling
    highway %in% c('abandoned', 'bus_guideway', 'byway', 'construction', 
                   'corridor', 'elevator', 'fixme', 'escalator', 'gallop', 
                   'historic', 'no', 'planned', 'platform', 'proposed', 
                   'raceway', 'steps') 
    ~ "no",

    # Way must have access rights
    access %in% c('private', 'no') ~ "no",

    # Way must not bar cyclists
    bicycle %in% c('no', 'private', 'ue_sidepath', 'restricted') ~ "no", 

    # Way must not contain "private" within the service tag
    grepl("private", service) ~ "no",

  ))

  # Case_when above should have added "no" to all inappropriate features, now 
  # find features that do not contain "no" and set as "yes"
  osm_sf_cycle$oi_cycle[is.na(osm_sf_cycle$oi_cycle)] = "yes"

  # If remove = TRUE, filter out features that have oi_cycle == "no"
  if (remove){
    osm_sf_cycle = osm_sf_cycle %>% dplyr::filter(osm_sf_cycle$oi_cycle == "yes")
  }
  return(osm_sf_cycle)
}

Example 1, 2 & 3 - Carriage Drive Carriage Drive : W689484794 - Street view image 74095066 - Street view 236322053 Contains Tags: access="no" bicycle="yes" foot="yes" (some other tags too but not useful in this context) A way that is used to access Roundhay Park in Leeds and is used by cyclists and pedestrians, hence the bicycle="yes" & foot="yes" tags.

I am not sure if the access tag has been used correctly in this instance, the general public are expected to use this way to access the park, yet the park will be (I assume) privately owned thus access="no" would technically be correct?

Similar to the discussion in #32 by Greta proposing to use acces NOT IN ("no", "private") OR foot IN "(yes") I propose we use something similar but for the bicycle="yes" tag, in SQL query this could be access NOT IN ('private', 'no') OR bicycle IN ('yes', 'designated', 'permissive', 'destination'), or just "yes", rather than just `access NOT IN ('private', 'no')


On the above (wrote an hour ago) - I have just tried updating the SQL query to:

proposed_osmextract_update = oe_get(
  place = "Leeds",
  provider = "bbbike",
  layer = "lines",
  #layer = "lines",
  extra_tags = c("access", "bicycle", "service"),
  force_download = TRUE,
  never_skip_vectortranslate = TRUE,
  force_vectortranslate = TRUE,
  vectortranslate_options = c(
    "-where", "
    (highway IS NOT NULL)
    AND
    (highway NOT IN (
    'abandoned', 'bus_guideway', 'byway', 'construction', 'corridor', 'elevator',
    'fixme', 'escalator', 'gallop', 'historic', 'no', 'planned', 'platform',
    'proposed', 'raceway', 'steps'
    ))
    AND
    (highway NOT IN ('motorway', 'motorway_link', 'footway', 'bridleway',
    'pedestrian') OR bicycle IN ('yes', 'designated', 'permissive', 'destination')
    )
    AND
    (access NOT IN ('private', 'no') OR bicycle IN ('yes'))
    AND
    (bicycle NOT IN ('private', 'no', 'use_sidepath', 'restricted'))
    AND
    (service NOT ILIKE 'private%')"
    ),
  quiet = FALSE
)

That is, I changed access NOT IN ('private', 'no') to (access NOT IN ('private', 'no') OR bicycle IN ('yes'))

I expected this to return a network that contained the additioanl 20 features I had found, instead this returned an additional 4,916 features- not what I expected!!! Additionally, out of those additional (4,916) features, only 1 out of the initial 20 additional features have are included!

See the bottom of the script within details for clarification. Running this script will show all variable dimensions and allow you to view them. If you do not have access to a computer here are my current variable dimensions:

Apologies if this does not make sense- the heat is getting to me today! Feel free to ask me any questions and I will try and reply as soon as possible. Cheers

``` library(dplyr) library(osmextract) extra_tags = c("foot", "access", "service", "bicycle", "footway") data = oe_get( place= "Leeds", extra_tags = extra_tags, layer = "lines", provider = "bbbike", force_download = TRUE, never_skip_vectortranslate = TRUE, force_vectortranslate = TRUE ) # Define the function oi_active_cycle_demo = function(osm_sf, remove=FALSE){ #browser() # Uncomment to debug osm_sf_cycle = osm_sf %>% dplyr::mutate(oi_cycle = dplyr::case_when( # Highway cannot be NA is.na(highway) ~ "no", #Create two case_when cases: # 1 - if highway = "bad highways" BUT bicycle = "good bicyele" then assign ~ "yes", # 2 - assign highway = "bad highways" ONLY as ~ "no". Ways that are appropriate for cyclists will already be "yes" from above # 1 (highway %in% c('motorway', 'motorway_link', 'footway', 'bridleway','pedestrian') & bicycle %in% c('yes', 'designated', 'permissive', 'destination')) ~ "yes", # 2 highway %in% c('motorway', 'motorway_link', 'footway', 'bridleway','pedestrian') ~ "no", # Highway values not appropriate for cycling highway %in% c('abandoned', 'bus_guideway', 'byway', 'construction', 'corridor', 'elevator', 'fixme', 'escalator', 'gallop', 'historic', 'no', 'planned', 'platform', 'proposed', 'raceway', 'steps') ~ "no", # Way must have access rights access %in% c('private', 'no') ~ "no", # Way must not bar cyclists bicycle %in% c('no', 'private', 'ue_sidepath', 'restricted') ~ "no", # Way must not contain "private" within the service tag grepl("private", service) ~ "no", )) # Case_when above should have added "no" to all inappropriate features, now # find features that do not contain "no" and set as "yes" osm_sf_cycle$oi_cycle[is.na(osm_sf_cycle$oi_cycle)] = "yes" # If remove = TRUE, filter out features that have oi_cycle == "no" if (remove){ osm_sf_cycle = osm_sf_cycle %>% dplyr::filter(osm_sf_cycle$oi_cycle == "yes") } return(osm_sf_cycle) } # Runs the above function, keeping all features cycle_output_full = oi_active_cycle_demo(data, remove=FALSE) # Runs the above function, keeps only features with [oi_active_cycle == "yes"] - (hopefully mimicking oe_get_network(mode = "cycling")) cycle_output_removed = oi_active_cycle_demo(data, remove=TRUE) osmextract_cycle_manual = data %>% filter(! is.na(highway)) %>% filter(! highway %in% c('abandoned', 'bus_guideway', 'byway', 'construction', 'corridor', 'elevator', 'fixme', 'escalator', 'gallop', 'historic', 'no', 'planned', 'platform', 'proposed', 'raceway', 'steps') ) %>% filter(! highway %in% c('motorway', 'motorway_link', 'footway', 'bridleway','pedestrian') | bicycle %in% c('yes', 'designated', 'permissive', 'destination') ) %>% # OLD filter(! access %in% c('private', 'no')) %>% # NEW BELOW #filter(! access %in% c('private', 'no') | bicycle %in% c('yes')) %>% filter(! bicycle %in% c('private', 'no', 'use_sidepath', 'restricted')) %>% filter(! grepl("private", service)) function_output = within(cycle_output_removed, rm(geometry)) osmextract_cycle = within(osmextract_cycle_manual, rm(geometry)) function_output_not_in_osmextract_cycle = dplyr::anti_join(as.data.frame(function_output), as.data.frame(osmextract_cycle)) osmextract_cycle_not_in_function_output = dplyr::anti_join(as.data.frame(osmextract_cycle), as.data.frame(function_output)) function_output_also_in_osmextract_cycle = dplyr::semi_join(as.data.frame(function_output), as.data.frame(osmextract_cycle)) proposed_osmextract_update = oe_get( place = "Leeds", provider = "bbbike", layer = "lines", #layer = "lines", extra_tags = c("access", "bicycle", "service"), force_download = TRUE, never_skip_vectortranslate = TRUE, force_vectortranslate = TRUE, vectortranslate_options = c( "-where", " (highway IS NOT NULL) AND (highway NOT IN ( 'abandoned', 'bus_guideway', 'byway', 'construction', 'corridor', 'elevator', 'fixme', 'escalator', 'gallop', 'historic', 'no', 'planned', 'platform', 'proposed', 'raceway', 'steps' )) AND (highway NOT IN ('motorway', 'motorway_link', 'footway', 'bridleway', 'pedestrian') OR bicycle IN ('yes', 'designated', 'permissive', 'destination') ) AND (access NOT IN ('private', 'no') OR bicycle IN ('yes')) AND (bicycle NOT IN ('private', 'no', 'use_sidepath', 'restricted')) AND (service NOT ILIKE 'private%')" ), quiet = FALSE ) osmextract_get_network_cycling = oe_get_network( place = "Leeds", provider = "bbbike", extra_tags = c("access", "bicycle", "service"), mode = "cycling", force_download = TRUE, never_skip_vectortranslate = TRUE, force_vectortranslate = TRUE ) # An additional 4,916 features are reutnred - not what I was expecting!!! proposed_diff = dplyr::anti_join(as.data.frame(proposed_osmextract_update), as.data.frame(function_output)) # of the additional 4,916 features returned, only 1 is from the 20 additional features I had found union_of_proposed = dplyr::semi_join(as.data.frame(proposed_osmextract_update), as.data.frame(function_output_not_in_osmextract_cycle)) ```
Robinlovelace commented 2 years ago

Specifically, if a feature is marked as access="no", but with the addition of bicycle="yes" (also, foot="yes") should this be included in osmextracts (and others) final walking/cycling network?

I think so yes.

Robinlovelace commented 2 years ago

I have found 20 features that maybe could be included in the definition of a cycling network - but due to access="no" these are excluded from the osmextract network despite containing bicycle="yes" & foot="yes" tags.

This is really useful info and yes I see this as an issue in {osmextract}. Feel free to open an issue.

Robinlovelace commented 2 years ago

So I guess we need to come up with a better set of rules?

hulsiejames commented 2 years ago

So I guess we need to come up with a better set of rules?

I think so potentially - I've had a little play this morning but think I will need to read up on some SQL queries and try this afternoon.

Working on LTN compliance and presence of lighting at this moment!

agila5 commented 2 years ago

Hi @hulsiejames!

Specifically, if a feature is marked as access="no", but with the addition of bicycle="yes" (also, foot="yes") should this be included in osmextracts (and others) final walking/cycling network?

I just checked the OSM wiki docs (https://wiki.openstreetmap.org/wiki/Key:access#Transport_mode_restrictions) and it looks like you are right, those streets should be included. Happy to check the code behind osmextract. However, I'm not sure why you get those discrepancies. I will check your examples as soon as possible (later this week or next week hopefully).

hulsiejames commented 2 years ago

However, I'm not sure why you get those discrepancies.

Me neither! Though, SQL is not my strong point. I will also have a look and see if I can find out the reason for these discrepancies as it could have been a simple mistype by myself - if so, I will update here accordingly.

hulsiejames commented 2 years ago

Quick update on the above - will look into this some more over coming days.

I had thought that by chaning the SQL query from access NOT IN ('private', 'no') to (access NOT IN ('private', 'no') OR bicycle IN ('yes')) we only picked up one out of the additional 20 features I had found with my function, due to some dplyr anti_join and semi_joins I had performed, along with returning ~4.9k additional features: (code below taken from details above)

# An additional 4,916 features are reutnred - not what I was expecting!!!
proposed_diff = dplyr::anti_join(as.data.frame(proposed_osmextract_update), as.data.frame(function_output))
# of the additional 4,916 features returned, only 1 is from the 20 additional features I had found
union_of_proposed = dplyr::semi_join(as.data.frame(proposed_osmextract_update), as.data.frame(function_output_not_in_osmextract_cycle))

However, it seems that all of the 20 additional features I had found are actually included in the proposed osmextract SQL query, see:

# The osm_id's of the additional 20 features I had found with my own function
ids = function_output_not_in_osmextract_cycle$osm_id

# Check if these ids are acutally included within the proposed osmextract SQL query
ids %in% proposed_osmextract_update$osm_id
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

As can be seen - the ids of the additional features are indeed caught by the proposed SQL filtering - I must have made some mistake when using the dplyr filtering joins.

To make sure this was behaving as expected, I added a random id as a sense check and it was detected as false:

> ids = c(ids, "111222333")
> ids %in% proposed_osmextract_update$osm_id
 [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE

Again, for reference, proposed_osmextract_update is a network defined with the following query:

proposed_osmextract_update = oe_get(
  place = "Leeds",
  provider = "bbbike",
  layer = "lines",
  #layer = "lines",
  extra_tags = c("access", "bicycle", "service"),
  force_download = TRUE,
  never_skip_vectortranslate = TRUE,
  force_vectortranslate = TRUE,
  vectortranslate_options = c(
    "-where", "
    (highway IS NOT NULL)
    AND
    (highway NOT IN (
    'abandoned', 'bus_guideway', 'byway', 'construction', 'corridor', 'elevator',
    'fixme', 'escalator', 'gallop', 'historic', 'no', 'planned', 'platform',
    'proposed', 'raceway', 'steps'
    ))
    AND
    (highway NOT IN ('motorway', 'motorway_link', 'footway', 'bridleway',
    'pedestrian') OR bicycle IN ('yes', 'designated', 'permissive', 'destination')
    )
    AND
    (access NOT IN ('private', 'no') OR bicycle IN ('yes'))
    AND
    (bicycle NOT IN ('private', 'no', 'use_sidepath', 'restricted'))
    AND
    (service NOT ILIKE 'private%')"
    ),
  quiet = FALSE
)

I have tried OR bicycle IN ('yes')) & OR bicycle = 'yes' interchangeably with identical network dimensions returned (98,965 features)

Next, I need to look at these additional returned features (~4.9k) by this change and see if they are all appropriate for a cycling network. For those that are not, what additional SQL filtering is needed to exclude them from the network.

Robinlovelace commented 2 years ago

Note: because we're writing R code we could also import the maximum number of features and do the filtering in R.

hulsiejames commented 2 years ago

Note: because we're writing R code we could also import the maximum number of features and do the filtering in R.

Provided there are no performance motives for applying the SQL filtering prior to loading into R, I think this could be a good idea.

Robinlovelace commented 2 years ago

There are performance motives but they are not the primary ones, especially when we're talking about a small fraction of the total.