ropensci / osmplotr

Data visualisation using OpenStreetMap objects
https://docs.ropensci.org/osmplotr
132 stars 21 forks source link

fix osm_line2poly #29

Open mpadge opened 6 years ago

mpadge commented 6 years ago

@richardbeare I've gone through most of your code and vignette with this commit. Note the following in relation to your vignette:

  1. osmdata::opq() now accepts explicit timeout (and memsize) arguments, but these actually aren't needed here. It should just work as is, and does for me, so no need to distract readers of vignette with that stuff
  2. I've removed most of the data you previously had (all of the highways), because this made the plot too large to include in an R package. We really only need the coastal data to illustrate the functionality, and I actually think this really aids the focus of the whole thing.
  3. This should actually make it small enough to include as data within the package, thereby enabling the vignette to actually be run without the graphs.

Now to the issue here which I'd love your help with:

tidy line2poly.R

I'm guessing you don't work with a code linter? Most of the changes I've made are just standard lintr suggestions, with a few extra personal things (more whitespace; vertical alignment of curly braces). I also put all separate functions out into the main function space. However, in it's current form this function no longer works. Can you just have a fish around to ensure that it all works. I think one problem is with the bbox interpretation - you often presume it has a fixed, matrix form with names rows and columns, yet there are no checks for this. Could you just write a small function to force any differently-formed bbox args into this form? (Or some equivalent way of dealing with this). Thanks in advance, and thanks for all the help here - very much appreciated!

richardbeare commented 6 years ago

I'll look into it. I haven't used lintr in the past. I normally conform to many of the standard recommendations, but don't agree with all of them. I was copying your other code as far as brackets were concerned, which is where most of my lint warnings are coming from. I'll check into the scoping stuff. I try to isolate functions that aren't going to be used anywhere else inside the function using them, unless they get too big. I'll see if there's a sensible balance.

mpadge commented 6 years ago

I'm entirely sympathetic with isolating functions inside other functions, but for better or worse that runs contrary to what rOpenSci expects and encourages. Putting them outside helps avoid messy implicit scoping, of which I found a few instances once i moved them outside. In short: Please avoid nesting function defs.

richardbeare commented 6 years ago

Interesting - is there a spec/list somewhere?

I've been using R, coming from a programming background, for nearly 20 years, and I've never had a problem with R's scoping. I know it is possible to construct crazy examples, but I've not had a problem with this. Avoiding nesting of definitions leads to other problems that most other languages use nesting to avoid.

PS - I think the problem is with the lapply calls. Should have it fixed tonight.

On Fri, Oct 6, 2017 at 5:53 PM, mark padgham notifications@github.com wrote:

I'm entirely sympathetic with isolating functions inside other functions, but for better or worse that runs contrary to what rOpenSci expects and encourages. Putting them outside helps avoid messy implicit scoping, of which I found a few instances once i moved them outside. In short: Please avoid nesting function defs.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/osmplotr/issues/29#issuecomment-334673761, or mute the thread https://github.com/notifications/unsubscribe-auth/AAvoobNheIHDR5MAbkHWJoCa89BwooyFks5spc5agaJpZM4Pu2lH .

mpadge commented 6 years ago

I put this question out to the rOpenSci slack group:

Is there an rOpenSci stance on nested functions? I've got a PR with lots of nested fns and have asked for them to be un-nested, but rightfully got asked in return: Why? And got asked more pointedly whether there was any official documentation/justification for avoiding nests (other than scoping, which is a non-issue here)?

so far only reply suggests "Don't see a clear answer", but it's still the wee hours for most. I'll let you know of any more responses.

richardbeare commented 6 years ago

It is interesting to see lots of opinions on this sort of thing, and to think back on how ones own impressions change as a result of familiarity with the language, and experience with other languages. In my case I suspect this is coloured a lot by undergrad teaching that was done in Ada (still one of the nicest languages I've used in many ways, but I haven't touched it in ages). Some thoughts that came up as a result of this little exercise and reading some of the style guides.

Of course, since this is a PR for your package, and you're the long term maintainer, so I'm happy to conform to your preferred style.

One set of recommendations is captured by Hadley's page http://adv-r.had.co.nz/Style.html and there is also the Google ones https://google.github.io/styleguide/Rguide.xml

I've never used the curly bracket at the end of the function header line, always putting it on its own on the next line. This is typical in C/C++:

newfuction <- function(a,b,c) { ## R style, which I'd never used, but looks like it is widely accepted

vs

newfunction <- function(a,b,c) ## C/C++ style {

I'd always liked the compact use of {} around if/else statements, and strongly agree with always having a pair of curly brackets in a single line if expression. Aiming for compactness is very likely to cause trouble later on. But the C/C++ recommendations for the same structure varies with project

if () { } else { }

instead of the sometime c++ version

if () {

} else {

}

I saw a recommendation that returns should only be used for early return from a function. Don't like this one - should do the same thing all the time (assuming there's no funny R quirk).

Nested functions - something to consider - are anonymous functions OK? If so, presumably small nested ones should be too. Larger ones, perhaps, could avoid nesting to stop functions getting too massive.

Its complicated !

On Fri, Oct 6, 2017 at 10:28 PM, mark padgham notifications@github.com wrote:

I put this question out to the rOpenSci slack group:

Is there an rOpenSci stance on nested functions? I've got a PR with lots of nested fns and have asked for them to be un-nested, but rightfully got asked in return: Why? And got asked more pointedly whether there was any official documentation/justification for avoiding nests (other than scoping, which is a non-issue here)?

so far only reply suggests "Don't see a clear answer", but it's still the wee hours for most. I'll let you know of any more responses.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/osmplotr/issues/29#issuecomment-334729270, or mute the thread https://github.com/notifications/unsubscribe-auth/AAvooZoE909zlLay2YVnA9JYY1_NhEoJks5spg7TgaJpZM4Pu2lH .

mpadge commented 6 years ago

Yeah, it's an interesting minefield all this stuff, and at the end all one can do is be opinionated one way or the other. Which made me realise that osmplotr didn't have a CONTRIBUTING.md file, so I've copied the one from bikedata. Check it out - I really do pretty much entirely share your opinions, especially with regard to vertical alignment of curly braces. The only place C++ can't translate directly to R is

if ()
{
}
else
{
}

I would prefer that but R will only interpret it with

if ()
{
} else
{
}

As for nesting - it seems very clear that there is no clarity, so that's also just a matter of opinion, of which i'd have just two:

  1. Avoiding nesting absolutely ensures no implicit scoping, which is a good thing
  2. I like to document all fns in a consistent and explicit way, which means doxygen/roxygen comments throughout, yet these are only highlighted when not indented. That means comments for nested fn defs are not highlighted the way non-nested fns are. Alternative solution would be to customise my (vim) syntax highlighter, but much easier to just avoid indentation.

Thanks for opening this can o'worms!

richardbeare commented 6 years ago

I think the major reason for standards for C++ code on any given project is to ensure that not too much time is spent reformatting one another's code.

I certainly agree with the idea that if nesting is done, it shouldn't rely on oddities of R scoping - that's just asking for trouble.

Glad to hear that I'm collaborating with a user of old-school editors! I still regularly use emacs-ess.

On Sat, Oct 7, 2017 at 7:30 PM, mark padgham notifications@github.com wrote:

Yeah, it's an interesting minefield all this stuff, and at the end all one can do is be opinionated one way or the other. Which made me realise that osmplotr didn't have a CONTRIBUTING.md file, so I've copied the one from bikedata. Check it out https://github.com/ropensci/osmplotr/blob/master/CONTRIBUTING.md - I really do pretty much entirely share your opinions, especially with regard to vertical alignment of curly braces. The only place C++ can't translate directly to R is

if () { } else { }

I would prefer that but R will only interpret it with

if () { } else { }

As for nesting - it seems very clear that there is no clarity, so that's also just a matter of opinion, of which i'd have just two:

  1. Avoiding nesting absolutely ensures no implicit scoping, which is a good thing
  2. I like to document all fns in a consistent and explicit way, which means doxygen/roxygen comments throughout, yet these are only highlighted when not indented. That means comments for nested fn defs are not highlighted the way non-nested fns are. Alternative solution would be to customise my (vim) syntax highlighter, but much easier to just avoid indentation.

Thanks for opening this can o'worms!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/osmplotr/issues/29#issuecomment-334919635, or mute the thread https://github.com/notifications/unsubscribe-auth/AAvooVyM76f54tpP-euS27KtaVXYkrq4ks5spzargaJpZM4Pu2lH .

mpadge commented 6 years ago

I still get an error on this:

> bbox <- osmdata::getbb ("greater melbourne, australia")
> coast <- opq (bbox = bbox) %>%
      add_osm_feature (key = "natural", value = "coastline") %>%
      osmdata_sf (quiet = FALSE)
> coast_poly <- osm_line2poly (coast$osm_lines, bbox)
Error in FUN(X[[i]], ...) : 
  unused argument (V = c(53, 39, 4, ...

Without even worrying about exactly why this happens, I note that the lapply that causes it is the one on current L#84:

linkorders <- lapply (startidx, function (X) unroll_rec (X), V = m2)

but startidx is a simple vector, so there's no need for an lapply. Simply unroll_rec (startidx, V = m2) will give the desired result, right? Same for next line, because linkorders is a vector, not a list. Also a couple of lines later, links <- lapply (linkorders, ...) is also just sub-setting a matrix as headtail [linkorders, , drop = FALSE]. And the call after that of,

links <- lapply (links, function (X) lookup_ways (X), g)

is also just

links <- lookup_ways (links, g)

... and so on. I've accepted the PR as you've seen, but could you please re-do it and vectorise all fns where possible, and ensure that the above code works. Thanks!

richardbeare commented 6 years ago

The lapply calls are there for cases when there are multiple of what I've called "chains". This requires more than a single lookups, and the results can be variable length, thus the need for a list. I think this happens in the greater Melbourne example, with one or more of the islands being constructed from more than one way and the mainland coast being made of lots of other ways.

My memory is already struggling, but I'll check again. Pretty sure if wasn't simple all the time. Will be neater if it is.

The most complex/least intuitive part is the "unroll_rec", which follows a trail of indexes. I couldn't think of a way of doing this with functional programming tools.

mpadge commented 6 years ago

okay, i see. Then there are a few options, but the easiest is likely a simple

if (!is.list (startidx))
  startidx <- list (startidx)

Then the lapply calls will work in effective vectorised form. However, startidx can only be a vector, i'd think, so i don't see where these lists arise. Can you dig out two contrasting examples of simple (vector) and non-simple (list) results so we can unpick this a bit better?

richardbeare commented 6 years ago

The list comes in the response, rather than the input. startidx is a vector of positions in the original (corresponding to NA, from a match call (I think)). It won't be a list.

multiple NA means that there are several "groupings" of ways in the bbox. The idea of the lapply call is to pass each element of startidx to the unrolling procedure, and pack the results into a list. The results of each lookup could have a different length. Thus the lapply is more about the return type than the input type. Using lapply on things that are vectors, rather than lists is pretty common. After all, if the functions you want aren't already vectorised, the apply family is the only way to achieve that at the r level.

I'll dig through my examples and see what prompted it.

On Sun, Oct 8, 2017 at 7:42 PM, mark padgham notifications@github.com wrote:

okay, i see. Then there are a few options, but the easiest is likely a simple

if (!is.list (startidx)) startidx <- list (startidx)

Then the lapply calls will work in effective vectorised form. However, startidx can only be a vector, i'd think, so i don't see where these lists arise. Can you dig out two contrasting examples of simple (vector) and non-simple (list) results so we can unpick this a bit better?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/osmplotr/issues/29#issuecomment-334992015, or mute the thread https://github.com/notifications/unsubscribe-auth/AAvooZHbf_y_hFNTrXk5xqoNRyHuzONJks5sqIsSgaJpZM4Pu2lH .

richardbeare commented 6 years ago

It turns out that the greater melbourne map is a surprisingly good example of the odd formats! The western edge of the bbox cuts through the Bellarine Peninsula, and that leads to two groups of "ways", which can be observed by placing a breakpoint inside osm_line2poly as follows:

coast_poly <- osm_line2poly (coast$osm_lines, bbox)

Browse[2]> startidx
[1] 15 30

Browse[2]> linkorders
[[1]]
 [1] 15 71 56 37 36 48 49 47 46 45 10 11 12 13 14 73  2 39 38  9 40 41  3  4  5  6 19 27 17 26 25 23 21 28 18 24 22
[38] 20 16

[[2]]
 [1] 30 32 31 35 29 33 44 34 58 61 60 59 55 54 63 50 43 51 52  1 53

Presumably the short one is the Bellarine peninusla part and the long one is the rest of the mainland.

The next step removes those ways that have been linked from the original set, because those that aren't grouped must form closed polygons:

        head_tail <- head_tail [-unlist (linkorders), , drop = FALSE] #nolint

So, apart from tidying up the function that follows indexes (unroll_rec), I can't see an alternative to lapply that doesn't involve loops.

I'm still looking for a functional programming way of doing the unrolling, but only because I think there's a neat trick I'm missing.

mpadge commented 6 years ago

great stuff! Thanks so much for all the work here

richardbeare commented 6 years ago

I've done some thinking about the unrolling functions - probably best to go back and do them with loops. I have the prototypes ready when you have integrated the other changes to your satisfaction.

mpadge commented 6 years ago

Please just send another PR as soon as you can - I've already accepted the last one, so am just waiting for the improved version of the unrolling functions

homer3018 commented 3 years ago

Hello there, I've just bumped into the same issue you're mentioning while trying to transform some coast lines into polygon so that it can be filled. Is there any chance someone could have another look at that ? Anyhow, good work there! Thanks a lot.

richardbeare commented 3 years ago

There is some very ancient work in these branches

https://github.com/richardbeare/osmplotr/tree/SFExperiments https://github.com/richardbeare/osmplotr/tree/SFSeaLand

There's a lot of work involved in reliably doing water/land shading, and the quickest way to get something reliable was to pull in the power of simplefeatures, allowing reliable intersection computation and so on. Our original experiments were rapidly heading towards reimplementing a lot of functionality that is now in sf, so building on sf is the way to go.

It was a long time ago, so I don't remember the details, but there should be something you can work on in those two branches.