r-transit / tidytransit

R package for working with GTFS data
https://r-transit.github.io/tidytransit/
150 stars 22 forks source link

`test-spatial.R` errors if sf adds attributes to sf objects #210

Closed mikemahoney218 closed 12 months ago

mikemahoney218 commented 1 year ago

Hi there!

As currently written, test-spatial.R is partially testing that the underlying sf object never changes. Specifically, these two tests (the last line of the code blocks) will throw errors if sf ever changes the attributes attached to sf objects:

https://github.com/r-transit/tidytransit/blob/878e64a4a4b7f10b42f0e9d2c156fd1314fddaad/tests/testthat/test-spatial.R#L94-L99

https://github.com/r-transit/tidytransit/blob/878e64a4a4b7f10b42f0e9d2c156fd1314fddaad/tests/testthat/test-spatial.R#L147-L158

The issue is that by using expect_equal(), testthat is partially testing if your objects contain exactly the same attributes. That means you're partially testing internal implementation details of the sf object that aren't guaranteed to remain the same over time. There's a bit more about how testing the attributes of objects created by other packages can be a problem on the Tidyverse blog.

At the moment, we're trying to add a new attribute to sf objects, which causes these tests to error. I'm writing to ask if you'd consider changing these tests to allow sf to improve the internal structure of sf objects.

Specifically, if I understand these tests correctly, I think you're testing to make sure that column values are equivalent after data transformations. Would you consider looping through to compare each of the columns, rather than comparing the data structure as a whole? The first test could be:

for (col in colnames(gtfs_duke$stops)) {
    expect_equal(duke_df$stops[[col]], gtfs_duke$stops[[col]], tolerance = 0.0001)
  }

And the second test:

for (col in c("n_stop_ids", "dist_mean", "dist_median", "dist_max")) {
    expect_equal(x1[[col]], x2[[col]])
  }

If you're open to it, I'd be happy to send you a PR to make these changes!

mikemahoney218 commented 1 year ago

Another approach would be to set check.attributes = FALSE:

expect_equal(duke_df$stops[colnames(gtfs_duke$stops)], gtfs_duke$stops, tolerance = 0.0001, check.attributes = FALSE)

expect_equal(x1[,c("n_stop_ids", "dist_mean", "dist_median", "dist_max")],  
               x2[,c("n_stop_ids", "dist_mean", "dist_median", "dist_max")],
               check.attributes = FALSE) 
polettif commented 12 months ago

Thanks for the heads up and the code to fix the issue.

I used check.attributes for the first test and removed attributes(duke_00$shapes)$.internal.selfref <- NULL as well. That already pointed to something not quite clean. Comparing values for the second test looks good to me.

mikemahoney218 commented 12 months ago

Thanks!