huizezhang-sherry / cubble

A tidy structure for spatio-temporal vector data
https://huizezhang-sherry.github.io/cubble/
Other
55 stars 9 forks source link

cubble coerced from raster `stars` object does not inherit CRS #24

Open loreabad6 opened 7 months ago

loreabad6 commented 7 months ago

Raster data cubes do not inherit the CRS when coerced to cubble. This is because the cubble creation is extracting the coordinate values as integers, losing the geometry here.

library(stars)
library(cubble)
tif = system.file("tif/L7_ETMs.tif", package = "stars")
(x =  stars::read_stars(tif))
#> stars object with 3 dimensions and 1 attribute
#> attribute(s):
#>              Min. 1st Qu. Median     Mean 3rd Qu. Max.
#> L7_ETMs.tif     1      54     69 68.91242      86  255
#> dimension(s):
#>      from  to  offset delta                     refsys point x/y
#> x       1 349  288776  28.5 SIRGAS 2000 / UTM zone 25S FALSE [x]
#> y       1 352 9120761 -28.5 SIRGAS 2000 / UTM zone 25S FALSE [y]
#> band    1   6      NA    NA                         NA    NA
(t = x |> as_cubble())
#> ℹ More than 10,000 keys: only use the first key to test spatial &
#> temporal variables.
#> # cubble:   key: id [122848], index: time, nested form
#> # spatial:  [288790.500000803, 9110743.00002899, 298708.50000055,
#> #   9120746.50002874], Missing CRS!
#> # temporal: band [int], L7_ETMs.tif [dbl]
#>          x        y     id ts              
#>      <dbl>    <dbl>  <int> <list>          
#>  1 288791. 9120747. 122500 <tibble [6 × 2]>
#>  2 288819. 9120747. 122501 <tibble [6 × 2]>
#>  3 288848. 9120747. 122502 <tibble [6 × 2]>
#>  4 288876. 9120747. 122503 <tibble [6 × 2]>
#>  5 288905. 9120747. 122504 <tibble [6 × 2]>
#>  6 288933. 9120747. 122505 <tibble [6 × 2]>
#>  7 288962. 9120747. 122506 <tibble [6 × 2]>
#>  8 288990. 9120747. 122507 <tibble [6 × 2]>
#>  9 289019. 9120747. 122508 <tibble [6 × 2]>
#> 10 289047. 9120747. 122509 <tibble [6 × 2]>
#> # ℹ 122,838 more rows

One option would be to coerce the tibble to sf before coercing to cubble before line 192, to recover the CRS:

    as_tibble(data) |>
      mutate(id = as.integer(interaction(!!sym(longlat[[1]]),
                                         !!sym(longlat[[2]])))) |>
      st_as_sf(crs = st_crs(x), coords = longlat) |>
      as_cubble(key = id, index = time) 

or to add make_spatial_sf() after line 192:

    as_tibble(data) |>
      mutate(id = as.integer(interaction(!!sym(longlat[[1]]),
                                         !!sym(longlat[[2]])))) |>
      as_cubble(key = id, index = time, coords = longlat) |>
      make_spatial_sf(crs = sf::st_crs(data))

But that is not such a fast approach and also adds an extra geometry column that the user might find unexpected. So I am not sure what would be the best approach here.

huizezhang-sherry commented 7 months ago

This also relates to another feature I hope cubble can have. Say, one creates a cubble from a tibble and would like to add a crs. One should be able to do this with something like:

climate_flat |> as_cubble(key = id, index = date, coords = c(long, lat), crs = sf::st_crs(4326))

This makes me think as_cubble() should probably accept a crs argument and parse it as an attribute. We will also need to add a st_transform.cubble_df to handle coordinate transformation.

Let me know your thoughts on this.

loreabad6 commented 7 months ago

Say, one creates a cubble from a tibble and would like to add a crs.

Yes, I agree this would be useful, also considering that one would not to create an sf object first. Would this always create a geometry column? I would assume so, since it would basically be passed onto an sf internally. In that same line of thought, does it make sense to keep the x/y lat/long columns on the cubble, or should a geometry column suffice, following what sf does?

We will also need to add a st_transform.cubble_df to handle coordinate transformation.

I imagine sf methods would only be for the spatial_cubble_df class? I think having a seamless interaction with sf would be great. Is this not handle when the spatial_cubble_df inherits the sf class? Should it always do so?

huizezhang-sherry commented 7 months ago

Yes, if users decide to incorporate crs in their workflow, they should get familiar with sf objects.

If crs becomes an additional attribute, then st_transform.cubble_df will need to come in, and it is no longer relevant now.

If spaital_cubble_df inherits the sf class, st_transform should already work.

Now, for both tbl_df and stars objects, they are first created as a cubble (as_cubble) and then use sf for processing crs (make_spatial_sf).

loreabad6 commented 7 months ago

I see, I would argue that if cubble intends to be at its core a bridge between sf and tsibble then the methods of both packages should be inheritted to the classes in cubble. That would make any other package built on top of these two work as well, for example for plotting.

I personally see the benefit of adding a crs variable, and not only adding st_transform.cubble_df but also st_set_crs.cubble_df as methods, assuming that a cubble_df is always spatial then make_spatial_sf would be an internal process. I am not sure if this will sacrifice any time or performance though.