ropensci / stplanr

Sustainable transport planning with R
https://docs.ropensci.org/stplanr
Other
419 stars 66 forks source link

Argument of segment_length in line_segment fun causes issue #552

Closed wangzhao0217 closed 5 months ago

wangzhao0217 commented 11 months ago

for this minimal dataset: rnet_x = sf::read_sf("https://github.com/ropensci/stplanr/releases/download/v1.0.2/rnet_x_ed.geojson") rnet_y = sf::read_sf("https://github.com/ropensci/stplanr/releases/download/v1.0.2/rnet_y_ed.geojson")

segment_length = 15 will cause error, but segment_length =10 is not.

Robinlovelace commented 11 months ago

That definitely sounds like a bug. Will aim to take a look today. Did it work previously?

wangzhao0217 commented 11 months ago

I didn't spot the issue before, segment_length was not defined in rnet_merge in the previous test.

On Thu, 2 Nov 2023 at 11:51, Robin Lovelace @.***> wrote:

That definitely sounds like a bug. Will aim to take a look today. Did it work previously?

— Reply to this email directly, view it on GitHub https://github.com/ropensci/stplanr/issues/552#issuecomment-1790587945, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARZESTQDIDH2X7FMLY2IRX3YCOCNZAVCNFSM6AAAAAA6ZSJX7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJQGU4DOOJUGU . You are receiving this because you were assigned.Message ID: @.***>

Robinlovelace commented 11 months ago

I didn't spot the issue before, segment_length was not defined in rnet_merge in the previous test.

Can you provide a full reproducible example?

Robinlovelace commented 11 months ago

Full reproducible example, showing that this is for sure an issue:

rnet_x = sf::read_sf("https://github.com/ropensci/stplanr/releases/download/v1.0.2/rnet_x_ed.geojson")
rnet_y = sf::read_sf("https://github.com/ropensci/stplanr/releases/download/v1.0.2/rnet_y_ed.geojson")

segment_length = 15 # will cause error, but segment_length =10 is not
library(stplanr)
# ?rnet_merge

names(rnet_y)
#> [1] "value"     "Quietness" "geometry"
funs = list(vaue = sum, Quietness = mean)

res = rnet_merge(rnet_x, rnet_y)
#> [1] "funs is NULL"
#> Joining with `by = join_by(identifier)`
res2 = rnet_merge(rnet_x, rnet_y, segment_length = segment_length)
#> [1] "funs is NULL"
#> Error in `[[<-`:
#> ! Assigned data `res` must be compatible with existing data.
#> ✖ Existing data has 3859 rows.
#> ✖ Assigned data has 3858 rows.
#> ℹ Only vectors of size 1 are recycled.
#> Caused by error in `vectbl_recycle_rhs_rows()`:
#> ! Can't recycle input of size 3858 to size 3859.
#> Backtrace:
#>      ▆
#>   1. ├─stplanr::rnet_merge(rnet_x, rnet_y, segment_length = segment_length)
#>   2. │ └─stplanr::rnet_join(rnet_x, rnet_y, dist = dist, crs = crs, ...)
#>   3. │   ├─stplanr::line_segment(rnet_y, segment_length = segment_length)
#>   4. │   └─stplanr:::line_segment.sf(rnet_y, segment_length = segment_length)
#>   5. │     └─stplanr:::line_segment_rsgeo(l, n_segments = n_segments)
#>   6. │       ├─base::`[[<-`(`*tmp*`, attr(l, "sf_column"), value = `<LINESTRING [°]>`)
#>   7. │       └─tibble:::`[[<-.tbl_df`(`*tmp*`, attr(l, "sf_column"), value = `<LINESTRING [°]>`)
#>   8. │         └─tibble:::tbl_subassign(...)
#>   9. │           └─tibble:::vectbl_recycle_rhs_rows(value, fast_nrow(xo), i_arg = NULL, value_arg, call)
#>  10. │             ├─base::withCallingHandlers(...)
#>  11. │             └─vctrs::vec_recycle(value[[j]], nrow)
#>  12. └─vctrs:::stop_recycle_incompatible_size(...)
#>  13.   └─vctrs:::stop_vctrs(...)
#>  14.     └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = call)
res3 = rnet_merge(rnet_x, rnet_y, segment_length = 10)
#> [1] "funs is NULL"
#> Joining with `by = join_by(identifier)`

Created on 2023-11-02 with reprex v2.0.2

Robinlovelace commented 11 months ago

The line that jumps out to me is this one:

stplanr:::line_segment_rsgeo(l, n_segments = n_segments)

Could it be another issue with the Rust implementation? Heads-up @JosiahParry in case, I'll have a look and try to pull out a minimal example.

Robinlovelace commented 11 months ago

Confirmed: seems one out of ~4k expected geometries is missing:

res_tbl[[attr(l, "sf_column")]] <- res
Error in `[[<-`(`*tmp*`, attr(l, "sf_column"), value = list(c(-3.20572,  : 
  Assigned data `res` must be compatible with existing data.
✖ Existing data has 3859 rows.
✖ Assigned data has 3858 rows.
ℹ Only vectors of size 1 are recycled.
Caused by error in `vectbl_recycle_rhs_rows()`:
! Can't recycle input of size 3858 to size 3859.
JosiahParry commented 11 months ago
library(rsgeo)

x <- geom_linestring(
  c(-3.19416, -3.19352, -3.19288),
  c(55.95524, 55.95535, 55.95546)
)

line_segmentize(x, 6) |> 
  expand_geoms()

Minimal reproducible example. I think this may be fixed upstream? I'm not sure what the cause is here. Alo as an aside, these are likely somewhat inaccurate segments since its being applied to geodetic coordinates instead of planar ones.

JosiahParry commented 11 months ago

I genuinely think its because the numbers are so small.

library(rsgeo)

x <- geom_linestring(
  c(-3.19416, -3.19352, -3.19288) * 10,
  c(55.95524, 55.95535, 55.95546) * 10
)

line_segmentize(x, 6) |> 
  expand_geoms()
Robinlovelace commented 11 months ago

I think this may be fixed upstream?

Re-compiling now after entering

remotes::install_dev("rsgeo")
Robinlovelace commented 11 months ago

Update: same issue after rebuilding.

Robinlovelace commented 11 months ago

I genuinely think its because the numbers are so small.

Yes that could potentially explain it:

> line_segmentize(x, 6) |> 
+     expand_geoms()
[[1]]
<rs_LINESTRING[5]>
[1] LineString([Coord { x: -3.19416, y: 55.95524 }, Coord { x: -3.194, y: 55.955267500000005 }, Coord { x: -3.193946666666667, y: 55.95527666666... 
[2] LineString([Coord { x: -3.193946666666667, y: 55.95527666666667 }, Coord { x: -3.19384, y: 55.95529500000001 }, Coord { x: -3.19373333333333... 
[3] LineString([Coord { x: -3.1937333333333333, y: 55.955313333333336 }, Coord { x: -3.19368, y: 55.9553225 }, Coord { x: -3.19352, y: 55.95535 }]))
[4] LineString([Coord { x: -3.19352, y: 55.95535 }, Coord { x: -3.19352, y: 55.95535 }, Coord { x: -3.193306666666667, y: 55.95538666666667 }]))    
[5] LineString([Coord { x: -3.193306666666667, y: 55.95538666666667 }, Coord { x: -3.1933066666666665, y: 55.95538666666667 }, Coord { x: -3.193... 

There shouldn't be any tiny linestrings in the example data though, minimum length should be 15m.

JosiahParry commented 11 months ago

This is likely a bug, yes and should be fixed. But more than anything I think this is a misuse of the function. It's running into an error because of floating point precision. You're applying a euclidean algorithm in a geodetic space. In this very specific example we're partitioning 0.001298769 into 6 equal length pieces of size 0.0002164615. We're dealing with tiny units of precision that is getting futzed up. This does not occur if you increase the scale by even 1 decimal place.

I have pushed a HaversineSegmentize trait to georust upstream which would measure this in units of meters instead of euclidean units. Creating a line_segmentize_haversine() would be the real fix here.

Robinlovelace commented 11 months ago

Yeah. Running it on projected data sounds like a good plan. Can you try that @wangzhao0217 ?

JosiahParry commented 11 months ago

Related https://github.com/georust/geo/pull/1107