npct / pct-shiny

The Shiny map for Local Authorites
GNU Affero General Public License v3.0
24 stars 14 forks source link

LSOAs and WZs #450

Closed mem48 closed 7 years ago

mem48 commented 7 years ago

I want a place to discuss the options for routeing the LSOAs and whether we route LSOA to LSOA or LSOA to Work Zone.

2016-11-22

This shows probably the most extream example of the problem of a large LSOA (black lines) where few people live but many people work. WZs in Red. This LSOA is >3km across. Our method gives the impression of street level accuracy so routeing to a location 3km from the desired location could cause serious misrepresentation of the data.

I'm also worried that the error with and LSOA - LSOA approach will be greatest in exactly the places where increased cycling is most likely (dense cities).

There are 32,844 LSOAs in England and 50,868 WZ so routeing numbers should be comparable. In rural areas, WZ and LSOAs are comparable (See picture of Cumbria below)

2016-11-22 2

Small towns should also get a boost in accuracy where the are only covered by 2-3 LSOA they will have about 10 WZ so a more representative set of routes can be produced

The issues as I see them are:

1) Slightly increase number or routes due to increased number of destinations 2) No two-way routes (all routes start at an LSOA and end at a WZ) this potentially doubles the number of routes but I suspect will be offset by wider selection of WZ and not all LSOA pairs are two way. 3) LSOA - WZ is not published on WCID so would have to be constructed from OA-WZ data This is simple to do as LSOAs are made up of OA but is still needs to be done. 4) Because of 3) WZ data does not include number of cyclists just:

Could we proportionally allocate the other methods to cycling based on the appropriate LSOA-LSOA data?

The benefits are:

1) Routes terminate near place of work not in residential areas, thus improved accuracy of routeing. 2) Improved accuracy in routeing especially in towns due to more WZ than LSOA 3) No in zone flows - people who live and work in the same LSOA still get routed

Robinlovelace commented 7 years ago

Interesting that the LSOAs are large precisely where you want them small in some destinations. The challenge is to convert the LSOA centroids to WPZ centroids or centroid collections. I see merit in separating the geographic level of the destinations from that of the origins and that is now slightly easier since I added, with input from @ilanfri, a destinations argument into the line2route() function as documented here: https://github.com/ropensci/stplanr/commit/462080b4810440b380e80e975c7255a28b29f765

I'm not sure that we have the time to do this quickly so would suggest we go for the LSOA-LSOA approach in the 1st instance and then explore methods for changing the geographic level of destinations after we have a decent product at the LSOA level, as discussed with @AnnaGoodman1.

Useful discussion to be having, cheers for initiating it and interested to hear what others think.

mem48 commented 7 years ago

Some extra info for point 4) on how to get the number of cyclists to a WZ

I looked at the matching of WZ and LSOAs

Exact Match - 1765 LSOA made up of WZ - 7119 WZ made up of LSOA - 485 Complex - 44209

Ths is all WZ once they are subset to the ones of interest there may be slight differences

Exact matches - where the WZ and the LSOA are the same are easy we just use the LSOA figure WZ made up of LSOA are also easy as we can sum LSOA to make WZ values LSOA made up of WZ are more complex as have missing information. My best suggestions is; that cyclists are distributed in proportion with "other method of travel"

Complex WZ do not neatly line up with LSOA some weighted distribution based on area or OA data would be required.

mem48 commented 7 years ago

Also I've found out;

There are 7.5 million LSOA-WZ lines with straight line of less than 20km and at least 1 commuter vs 4.1 million for oneway LSOA pairs

AnnaGoodman1 commented 7 years ago

thanks for this @mem48 , really helpful. For me the critical thing – which I did not know before – is "LSOA - WZ data does not include number of cyclists". I think this is a major problem, we have just had experience in Manchester trying to allocate cycling proportionately (based on their data which is people walking plus cycling combined), and it was a lot of work but still did not come out very convincingly. I think the advantage in geographical accuracy would be outweighed by the fact that we would have less reliable input data on who is cycling where. As such, this makes me much less keen on using WZ. (although having said this, this disadvantage does not apply to the go Dutch and ebikes layers, which only require us to know the total number of commuters on a route, not the number of cyclists at baseline)

also I think it actually is not true that the LSOA level is likely to be most useful in dense places like city of London. my understanding from @RachelAldred is that users of the PCT in urban areas seem pretty happy with the MSOA route network layer, and it is instead particularly the users in rural areas who find it too sparse. so the main added value of the LSOA layout will be in rural areas.

lastly I'm a bit worried about timing, it seems to me quite plausible we would run out of time to do the LSOA-WZ layer properly before end of January since there are a nontrivial number of additional steps involved.

So I agree with @Robinlovelace that we should focus on LSOA-LSOA. Possibly, if we have time, we could then explore LSOA-WZ as a prototype in one region for the go Dutch/ebikes scenarios , to get a sense of how much value is added for those two scenarios.

mem48 commented 7 years ago

I think I am in agreement, the need to build up from OA to WZ data and hence the lack of cyclists, was not something I knew about either. I push on with LSOA to LSOA and we can leave WZ on the back burner for now. Perhaps people with contacts at ONS could ask if an LSOA to WZ dataset could be produced in the style of the the LSOA to LSOA data?

@RachelAldred are the urban users you spoken to aware that this these kinds of errors can occur within cities as well as rural areas?

mem48 commented 7 years ago

Oh and one thing I forgot:

LSOA - WZ is complex because there is not a neat relationship between WZ and LSOA (see earlier comment)

But WZ do not cross MSOA boundaries so a MSOA - WZ layer would be much simpler to create. A project for the future?

Robinlovelace commented 7 years ago

RJwrapper.pdf Overall we need new functions for aggregating flows based on underlying geometries. Imagine you have an OD level dataset called flow_oa of travel patterns at the OA-WPZ (as we do) and we want to aggregate to the the LSOA-WPZ level (as we may want to do). I'm imagining a function like this:

aggregate.od <- function(od_highres, od_lowres, FUN) {...}

This is moving towards the utility of a class for od data discussed with Richard Ellison in the attached paper, because all ODs would have attached geometries of origins and destinations, making aggregation possible using one method or another. In our use case you would say say:

flow_lsoa_agg = aggregate.od(flow_oa, flow_lsoa, sum)

@ilanfri do you fancy having a look at creating this function in the new year?

@richardellison, @mem48 and @nikolai-b interested to hear your thoughts on the best way to do this. Have been thinking about it for a while and this could be an ideal opportunity to do it based on open datasets.

Note there is a commercial product that does a similar thing: https://saspac.org/

richardellison commented 7 years ago

In general, it is not terribly complicated to do spatial aggregation (for OD or otherwise) using existing rgeos functions as you can simply generate a concordance table and then use that as the basis for aggregation.

To formalise a function that will do the aggregation automatically needs a bit more planning because I expect many users would try to aggregate data without understanding (or knowing) the relationships between the different boundaries. Specifically:

  1. What do you do if a small zone is bisected by two (or more) larger zones? Most common options are to either proportionally allocate flows based on area or assign all flows to the area with the larger proportion.
  2. Related to 1., How do you handle boundary layers with different levels of spatial accuracy/simplification?
  3. For OD flows specifically, what happens if origin zones are different to destination zones (LSOA - WZ as above, or SA2 to DZN (Destination Zone) in Australia?

Would certainly be a useful function though.

RachelAldred commented 7 years ago

Replying to Malcolm who wrote '@RachelAldred are the urban users you spoken to aware that this these kinds of errors can occur within cities as well as rural areas?'

Interesting point. Most recently in Greater Manchester the urban users indicated that they were wary of the route network particularly in more central areas, in relation to this. Showing the MSOA zones (even with zones turned off the boundaries are still visible) and centroids at least does highlight that issue - looking at the City/Bank MSOA in London, you can see that routing everyone to that centroid is skewing the volumes over one of the three bridges in the area, whereas in practice you might see a more even distribution. So I guess one concern is that with a raster LSOA layer, if the zone boundaries/centroids are not shown, then it could give the impression of spurious accuracy.

JDWoodcock commented 7 years ago

Can we have some kind of split option? That we use MSOAs where they are smaller and WZs where they are smaller?

Robinlovelace commented 7 years ago

Closing for now as @mem48 has a good plan for this: implement the LSOA-LSOA solution. We can re-open this after that has been completed once it becomes actionable again. Really useful discussion.