recode_pedestrian() - Githubissues

GretaTimaite commented 2 years ago

Just wondering if it would be useful to add an additional argument to the recode_pedestrian() that would allow to download osm data with keys needed for recoding from within the function? In other words, two options would be provided:

a) feed the osm data and ignore download argument; b) have download = "yes", and give place_name ( I guess for that a user would have to consult osmextract::oe_match_pattern()

if option b) is integrated, then I reckon it would be sensible to limit osmextract::oe_get() functionality. To illustrate how option b) could look like (not finished but this bit works when run as a separate function) within a function:

    if(downlaod_data == "yes" & !is.null(place_name)){

      download_data_df <- osmextract::oe_get(place = place_name,
                                             force_vectortranslate = TRUE,
                                             force_download = TRUE,
                                             extra_tags = c("footway", 
                                                            "foot",
                                                            "sidewalk",
                                                            "path",
                                                            "pedestrian"))

     return(download_data_df) # Note: this would not be present in the `recode_pedestrian()`
    }

    if(downlaod_data == "yes" & is.null(place_name)){

      return(warning("place_name is missing"))
    }
  }

However, I wonder if adding b) is not excessive. But, then, I also wonder if adding it would make it easier to use by those who might not be well versed in R?

GretaTimaite commented 2 years ago

Tested by Greta:

[x] with osm data that has all the needed keys and tags (default values)
[x] osm data but keys/tags needed for recategorization are missing
[ ] overriding default values (e.g., instead of keeping highway_footway_yes = NULL (default includes c("footway", "pedestrian", "living_street", one defines it as highway_footway_yes = "footway")
[x] filter_df (currently takes three values: "yes", "no", and "maybe") ----> is it even needed?

Robinlovelace commented 2 years ago

I think that would be useful but suggest creating lots of small functions that do 1 thing and joining them together, rather than few big functions, is a good way forward (from bitter experience)!

GretaTimaite commented 2 years ago

Update on the function: rewriting it.

The more I read and think about pedestrian network, the more I think it makes sense to have several functions for it emphasizing different "levels". For example, a function that returns a network of highways on which pedestrians can legally walk (this is something that osmextract I think already offers), but also provide the encouraged pedestrian network. For instance, the encouraged one could return only the primary roads where motor traffic's segregated, to some extent, from pedestrians. sidewalk tag would be a good indicator for that but not, for example, foot="yes" (but foot="designated" might work) as it might just indicate that pedestrians are allowed to walk on it. However, if the speed limit is 20mph (i.e., in residential areas) then sharing the space with motor traffic might be fine. Basically, the encouraged one would encourage to walk on safe highways? Lit key could be incorporated too.

Accessible network could be built onto the encouraged one, which could be categorized based on the compliance with guidelines #7.

Also, I think that it would make sense to provide a function for "cleaning" osm data relevant to pedestrian network before applying the function for recategorization (maybe someone has done the work on that already?). A basic example could be width values (including the infamous -1):

wy_walking %>% pull(width) %>% table()
.
     -1     .25       0     0.1     0.3     0.4     0.5    0.5m     0.6     0.7    0.75 
      1       1      16       7      20      24      60       1      27       4       1 
    0.8     0.9       1 1 - 1.5     1.0     1.1     1.2     1.3     1.4     1.5    1.52 
     25       4     175       2       6       8      15       6       1      40       3 
    1.6     1.7    1.75     1.8     1.9      10    10.5      11      12    13.5      14 
      9       3       3      11       2      10       1       2       1       1       1 
     1m       2     2-3     2.0   2.117     2.2     2.3     2.4     2.5     2.6     2.7 
      2      89       1       3       1      10       4      11      77      11       3 
    2.8     2.9      2+      23       3     3.0     3.2     3.4     3.5   3.5 m     3.8 
      1       3       1       4      97       2       7       2      12       1       2 
   37.5      3m       4     4.5   4.5 m     4.8      46      4m       5       6     6.5 
      1       1      41       2       1       1       3       1      17       4       2 
     6m       7     7.4     7.5       8     8.5       9  narrow 
      1       3       1       1       9       3       2       9

I could do the cleaning within a function that needs this information (or accessibility) but it should be much simpler to just clean it before feeding to the function? It a useful information so having it in a good format might make the life easier to others too.

In the future it might also be cool to think more about the walkability (index). As I'm writing this I'm thinking that it might be useful to give something like "accessibility index" to accessible pedestrian infrastructure instead of simply categorizing it as complying or not with guidelines (or both could be provided).

GretaTimaite commented 2 years ago

I've been thinking about encouraged pedestrian network (discussed above) and how I could sensibly categorize OSM highways based on that. Taking advantage of sidewalk, etc tags seems the most straightforward approach, however to me it is rather simplistic. For example, I was thinking that maybe not every minor road needs (it does not mean it would not be ideal) a separated sidewalk and could thus share the space with motor traffic given that motor traffic volume + speed is low. Therefore, it would be unnecessary to classify such highways (e.g., unclassified roads) as discouraged from walking because they are not safe?

Problem: OSM can give data on speed limits but not traffic volume of any kind. So, I recalled that there's DfT data on road traffic counts, which I've been investigating for the last couple of days. Indeed, it is an external data, which means I move beyond OSM to categorize it. That's a limitation (I think?) when writing a function but it does demonstrate the scalability of OSM.

Anyway, DfT (motor) traffic counts are not ideal either as it's a sample of major and minor roads (for example, there's motor traffic count for only 370 roads in Leeds). Yet, I think it still is interesting to explore them in relation to each other even if to explore the limitations (such as the need for buffers, so I can intersect geometries). Specifically right now I'm checking out which roads and why are dropped when I perform spatial join. My suspicion is that buffer of 6m (based on Haklay's (2010) estimation) is not always enough.

Next steps: to explore which minor roads have high or low motor traffic volume. Indeed, I struggle to find out what exactly is meant by high and low motor traffic volume (I reason it's a rather flexible, context-dependent term). Nevertheless, I came across 1000 motor vehicles per day as an acceptable threshold volume in residential areas as it would equate to about 100 motor vehicles in a peak hour which is recommended Manual for Streets when designing shared streets.

My main aim is to figure out if I can classify certain highways (e.g., unclassified, residential) as pedestrian friendly by default because of low motor traffic volumes. I reckon that there might be a weak association but, as often happens, it would be a simplification and would have some unexpectedly high motor traffic volumes in some, for instance, residential roads (increased by rat running?).

Robinlovelace commented 2 years ago

Problem: OSM can give data on speed limits but not traffic volume of any kind. So, I recalled that there's DfT data on road traffic counts, which I've been investigating for the last couple of days. Indeed, it is an external data, which means I move beyond OSM to categorize it. That's a limitation (I think?) when writing a function but it does demonstrate the scalability of OSM.

It's great you picked up on this. You can in fact get OSM data with estimated traffic flows added but it takes some work. I think @mvl22 (Martin re-meet Greta : ) has done this with bikedata, but the lines are currently straight:

It is possible and eminently doable to estimate road traffic on every road, but that would take a lot of time.

Robinlovelace commented 2 years ago

Source: https://bikedata.cyclestreets.net/trafficcounts/#12.4/53.79239/-1.52904

Robinlovelace commented 2 years ago

Next steps: to explore which minor roads have high or low motor traffic volume. Indeed, I struggle to find out what exactly is meant by high and low motor traffic volume (I reason it's a rather flexible, context-dependent term). Nevertheless, I came across 1000 motor vehicles per day as an acceptable threshold volume in residential areas as it would equate to about 100 motor vehicles in a peak hour which is recommended Manual for Streets when designing shared streets.

Sounds good.

Robinlovelace commented 2 years ago

My main aim is to figure out if I can classify certain highways (e.g., unclassified, residential) as pedestrian friendly by default because of low motor traffic volumes. I reckon that there might be a weak correlation but, as often happens, it would be a simplification and would have some unexpectedly high motor traffic volumes in some, for instance, residential roads (increased by rat running?).

:+1: to this idea.

GretaTimaite commented 2 years ago

It is possible and eminently doable to estimate road traffic on every road, but that would take a lot of time.

Could you briefly elaborate on how it would/could look like? Where would the data come from? Directly from local authorities? Would some data still had to be imputed? It would be helpful to have traffic counts for a (relatively) small area (e.g., Leeds) for each and every road just to explore the possibility and, maybe, give more robustness to the classification idea based on motor traffic volume.

Robinlovelace commented 2 years ago

It could look a bit like this (previous MSc student in ITS): https://github.com/unbrother/traffic-major-minor also lots of interesting papers here but this would require PhD-length research project to do properly is my guess: https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=openstreetmap+machine+learning+%22open+Source%22+traffic+aadt&btnG=

Robinlovelace commented 2 years ago

Another possibility would be to use SNA methods, I know Crispin Cooper down at Cardiff has looked at this, do you have any suggestions for quick wins in this are @fiftysevendegreesofrad ? Phoning a friend for a project with a few months left :pray:

mvl22 commented 2 years ago

sidewalk tag would be a good indicator for that

The sidewalk tag is really a negative tag, in the sense that in most part of the world is only used to indicate an exception to the norm. No-one bothers to put sidewalk=yes on every road, because people know that roads by default do have footways alongside. You only need it when there isn't a sidewalk, which is relatively rare. Using it to indicate any kind of quality level would give incorrect results.

For instance, pretty-much every street here has footways on both sides, but only a fraction use the tag, and those that do are a completely random subset: https://overpass-turbo.eu/s/1eRZ

the encouraged one could return only the primary roads where motor traffic's segregated, to some extent, from pedestrians

Even if the path is separately drawn you can't really determine the level of separation from the road.

NB Areas with better OSM coverage tend to see a higher ratio of streets with the footway drawn in separately, but most areas of the world haven't yet bothered.

However, if the speed limit is 20mph (i.e., in residential areas) then sharing the space with motor traffic might be fine.

Not really, unfortunately. In Cambridge for instance, every local street (<tertiary) is now 20mph, like an increasing number of towns that have gone "Twenty's Plenty". They are former 30mph. They really are not shared space. You could walk in the middle of the road in many cases but you have to be alert for vehicles coming behind.

This street is 20mph, though admittedly it is tertiary: https://www.cyclestreets.net/location/152460/

mvl22 commented 2 years ago

but the lines are currently straight

Yeah, we just haven't put the geography alignment up the priority list yet.

mvl22 commented 2 years ago

I've been thinking about encouraged pedestrian network (discussed above) and how I could sensibly categorize OSM highways based on that.

In general, the problem with deciding up-front on what is encouraged is that preference is quite subjective, and it depends really on your use-case. Admittedly, walking at least isn't quite as bad as cycling, where 10 cyclists will have 11 opinions as we say here in Cambridge ;)

For instance, is it preferable to walk on a very wide pavement, or to take the parallel street that has much less traffic but has lower subjective safety due to low levels of natural surveillance?

For instance, paths in a park may be preferred by some, but once it gets dark, that status changes.

Personally I would aim for a dataset that simplifies a key set of pavement attributes to simple booleans, enabling downstream uses to make that subjective judgement by querying those pre-assembled fields. For instance:

isWide
isLit
isSegregatedFromCyclists
isSharedSpace
numSideroadsInterrupting

Each of these can be quite painful for people to determine from raw OSM tags, so if this package can pre-process them upfront, you suddenly make it much easier for people then to make whatever subjective judgements they want to optimise on.

Robinlovelace commented 2 years ago

These seem like eminently sensible suggestions Martin, many thanks!

GretaTimaite commented 2 years ago

Thank you for your lengthy insights Martin! Just a few more thoughts, if you don't mind :)

In general, the problem with deciding up-front on what is encouraged is that preference is quite subjective ... For instance, is it preferable to walk on a very wide pavement, or to take the parallel street that has much less traffic but has lower subjective safety due to low levels of natural surveillance?

I agree it is subjective, this is why I am trying to come up with a non-pretentious, so to say, (re)categorization. Just to make it clear, my idea was to move away and, perhaps, provide a bit more sensible (in my opinion) categorization of the friendliness of the pedestrian network than the legal definition would permit (which says that only motorways and slip roads cannot be walked on). Hence, this is where the encouraged idea came up from.

In Cambridge for instance, every local street (<tertiary) is now 20mph, like an increasing number of towns that have gone "Twenty's Plenty". They are former 30mph. They really are not shared space.

Yes, good point. I did not want to convey an idea that speed limit itself is a good indicator of shared space but make an assumption that speed limit in combination with low (<1000 per day) motor traffic could be indicative of shared space or that the highway could be encouraged to walk on even if it has no physical separation from the low motor traffic?

It's not the best example as speed limit is over 20mph but I'll give an example. For instance, Thorner Lane (Google ViewStreet, OSM) is categorized as unclassified road, only a fraction of road has sidewalk, and had 953 mot.vehicles/day in 2019 and Wike Lane (Google StreetView), OSM, which is also unclassified, no physical separation from pedestrians but had 2844 mot.vehicles/day in 2019. I'd lean towards the categorization of Thorner Lane as encouraged (e.g., pedestrian friendly in its most basic form, hence not taking into consideration it being lit, etc) but not Wike Lane which has a much higher number of motor vehicles passing. That was how I was thinking about that "encouraged" pedestrian highway categorization. Would you say it's a rather futile approach? 🤔

Just to note: one of my goals is to categorize OSM to return accessible and inclusive pedestrian network, so I was thinking of the encouraged one to be something in-between (which, I acknowledge, would prioritize able-bodied pedestrians). Maybe I should just proceed to the accessible and inclusive one?

Personally I would aim for a dataset that simplifies a key set of pavement attributes to simple booleans, enabling downstream uses to make that subjective judgement by querying those pre-assembled fields. For instance:

Booleans sound like a good idea --I was thinking of providing a cleaned dataset but keeping it numeric. I think both options can be implemented.

fiftysevendegreesofrad commented 2 years ago

In terms of estimating traffic on every road. I have done it with my cycling models, but later found that using road classification (primary/secondary/tertiary/local/residential) was just as good anyway, at least in that particular study. https://www.nature.com/articles/s41598-019-55669-8 (also cites the earlier models with SNA traffic estimation)

Lots of interesting stuff in this thread. I agree walkability of individual links would be good to have. Accessibility is multi dimensional so it's better to provide booleans to be used downstream rather than a single mobility-biased index. Of course access to data is a challenge as ever. Presence/absence of steps on a route is something that we looked at with the Wales Active Travel data (Richard Price made models showing where the worst accessibility outliers were for people who can't climb steps) - I suspect data on steps might be one of the first things to become usable for this purpose in OSM.

udsleeds / openinfra

recode_pedestrian() #14