Closed mtennekes closed 4 years ago
The code line if (i %in% 1:2) class(mi) <- c("factor", "matrix")
in the script above is unnecessary and creates problems when applying as.data.frame
. Apparently, setting the dims is enough to create a factor matrix.
when all layers are factor
one could still put them in the third dimensions, but that makes only sense if each of them has the same levels
. I think your approach is the right one; one can always move from layer to multiple attributes and back (merge
and split
).
plot.stars
always only plots the first array; this could of course be generalized in the case of several attributes, but then one would need to handle multiple legends, messy in base plot, ... I guess this might be something for a dedicated package, like tmap?
There is a chance that I am over-thinking it, but my guts tell me that it could become an issue in the future. Categorical rasters are now read as a factor in stars, which changes the original values of the data. For example, a value of 1 in the tif file below is now represented by a factor, which is stored as an integer of 2. This could have serious consequences when making some operations (e.g., counting, selecting) as the user cannot use the original values. This discrepancy could also be confusing to many users. I do not have any good solutions yet, but I wanted to start a discussion about this issue.
library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.7.1, GDAL 2.3.2, PROJ 5.2.0
# data download -----------------------------------------------------------
tf = tempfile(fileext = ".zip")
td = tempdir()
download.file("https://github.com/r-spatial/stars/files/4345255/lanform-example.zip", tf)
unzip(tf, exdir = td)
lf_path = paste0(td, "/lanform.tif")
# data read ---------------------------------------------------------------
lf = read_stars(lf_path)
lf
#> stars object with 2 dimensions and 1 attribute
#> attribute(s), summary of first 1e+05 cells:
#> lanform.tif
#> 10.Low mountains : 369
#> 11.Scattered high mountains: 306
#> 9. Scattered low mountains : 273
#> 12.High mountains : 240
#> 17. Surface water : 8
#> (Other) : 0
#> NA's :98804
#> dimension(s):
#> from to offset delta refsys point values
#> x 1 7360 -1091676 300 PROJCS["unnamed",GEOGCS["... FALSE NULL [x]
#> y 1 3812 -38556.5 -300 PROJCS["unnamed",GEOGCS["... FALSE NULL [y]
levels(lf$lanform.tif)
#> [1] ""
#> [2] "1. Flat or nearly flat plains"
#> [3] "2. Smooth plains with some local relief"
#> [4] ""
#> [5] ""
#> [6] "5.Scattered moderate hills"
#> [7] "6. Moderate hills"
#> [8] "7. Scattered high hills"
#> [9] "8. High hills"
#> [10] "9. Scattered low mountains"
#> [11] "10.Low mountains"
#> [12] "11.Scattered high mountains"
#> [13] "12.High mountains"
#> [14] "13.Tablelands with moderate relief"
#> [15] "14.Tablelands with considerable relief"
#> [16] "15.Tablelands with high relief"
#> [17] "16.Tablelands with ver high relief"
#> [18] "17. Surface water"
gdalUtils::gdalinfo(lf_path)
#> [1] "Driver: GTiff/GeoTIFF"
#> [2] "Files: /tmp/RtmpyrmzC4/lanform.tif"
#> [3] " /tmp/RtmpyrmzC4/lanform.tif.aux.xml"
#> [4] "Size is 7360, 3812"
#> [5] "Coordinate System is:"
#> [6] "PROJCS[\"unnamed\","
#> [7] " GEOGCS[\"WGS 84\","
#> [8] " DATUM[\"unknown\","
#> [9] " SPHEROID[\"WGS84\",6378137,298.257223563]],"
#> [10] " PRIMEM[\"Greenwich\",0],"
#> [11] " UNIT[\"degree\",0.0174532925199433]],"
#> [12] " PROJECTION[\"Cylindrical_Equal_Area\"],"
#> [13] " PARAMETER[\"standard_parallel_1\",5.5],"
#> [14] " PARAMETER[\"central_meridian\",140.8],"
#> [15] " PARAMETER[\"false_easting\",0],"
#> [16] " PARAMETER[\"false_northing\",0],"
#> [17] " UNIT[\"metre\",1,"
#> [18] " AUTHORITY[\"EPSG\",\"9001\"]]]"
#> [19] "Origin = (-1091676.099780400050804,-38556.486310934997164)"
#> [20] "Pixel Size = (300.000000000000000,-300.000000000000000)"
#> [21] "Metadata:"
#> [22] " AREA_OR_POINT=Area"
#> [23] "Image Structure Metadata:"
#> [24] " COMPRESSION=DEFLATE"
#> [25] " INTERLEAVE=BAND"
#> [26] "Corner Coordinates:"
#> [27] "Upper Left (-1091676.100, -38556.486) (130d56'53.71\"E, 0d20'49.56\"S)"
#> [28] "Lower Left (-1091676.100,-1182156.486) (130d56'53.71\"E, 10d42' 9.59\"S)"
#> [29] "Upper Right ( 1116323.900, -38556.486) (150d52'27.05\"E, 0d20'49.56\"S)"
#> [30] "Lower Right ( 1116323.900,-1182156.486) (150d52'27.05\"E, 10d42' 9.59\"S)"
#> [31] "Center ( 12323.900, -610356.486) (140d54'40.38\"E, 5d30'10.31\"S)"
#> [32] "Band 1 Block=7360x1 Type=Byte, ColorInterp=Palette"
#> [33] " Min=1.000 Max=17.000 "
#> [34] " Minimum=1.000, Maximum=17.000, Mean=nan, StdDev=nan"
#> [35] " NoData Value=255"
#> [36] " Categories:"
#> [37] " 0: "
#> [38] " 1: 1. Flat or nearly flat plains"
#> [39] " 2: 2. Smooth plains with some local relief"
#> [40] " 3: "
#> [41] " 4: "
#> [42] " 5: 5.Scattered moderate hills"
#> [43] " 6: 6. Moderate hills"
#> [44] " 7: 7. Scattered high hills"
#> [45] " 8: 8. High hills"
#> [46] " 9: 9. Scattered low mountains"
#> [47] " 10: 10.Low mountains"
#> [48] " 11: 11.Scattered high mountains"
#> [49] " 12: 12.High mountains"
#> [50] " 13: 13.Tablelands with moderate relief"
#> [51] " 14: 14.Tablelands with considerable relief"
#> [52] " 15: 15.Tablelands with high relief"
#> [53] " 16: 16.Tablelands with ver high relief"
#> [54] " 17: 17. Surface water"
#> [55] " Metadata:"
#> [56] " STATISTICS_MAXIMUM=17"
#> [57] " STATISTICS_MEAN=nan"
#> [58] " STATISTICS_MINIMUM=1"
#> [59] " STATISTICS_STDDEV=nan"
#> [60] " Color Table (RGB with 256 entries)"
#> [61] " 0: 255,255,255,255"
#> [62] " 1: 252,255,194,255"
#> [63] " 2: 219,226,172,255"
#> [64] " 3: 231,255,43,255"
#> [65] " 4: 207,233,39,255"
#> [66] " 5: 104,227,79,255"
#> [67] " 6: 85,178,63,255"
#> [68] " 7: 125,169,58,255"
#> [69] " 8: 102,137,46,255"
#> [70] " 9: 214,175,98,255"
#> [71] " 10: 214,154,43,255"
#> [72] " 11: 114,83,38,255"
#> [73] " 12: 114,69,1,255"
#> [74] " 13: 201,161,224,255"
#> [75] " 14: 178,141,174,255"
#> [76] " 15: 161,128,135,255"
#> [77] " 16: 136,108,97,255"
#> [78] " 17: 197,244,254,255"
#> [79] " 18: 0,0,0,255"
...
#> [316] " 255: 0,0,0,0"
Created on 2020-03-19 by the reprex package (v0.3.0) landcover-colors.zip
Well spotted! This commit solves the problem you mention, but thanks to the fact that this raster has minimum value 1. You can't have the cake and eat it: R factors use numerical values underneath the levels that start at 1. So there are several options are now there:
Building on the land cover example of https://github.com/mtennekes/tmap/issues/368#issuecomment-595727166
library(stars)
r = read_stars("pr_landcover_wimperv_10-28-08_se5.img", RAT = "Land Cover Class")
r = droplevels(r)
plot(r, key.pos = 4, key.width = lcm(5))
will give 14 levels, starting at 1, where the original map starts at 0 (hence, 1 is added, that is where this came from, to the levels before droplevels
). When reading with
library(stars)
r = read_stars("pr_landcover_wimperv_10-28-08_se5.img", NA_value = 0)
r = droplevels(r)
plot(r, key.pos = 4, key.width = lcm(5))
the 0 values are converted to NA before converting to factor, so minimum (non-NA) value is 1, and values are preserved (before droplevel
). Maybe there is some silent convention to put nothing important in the 0 value? (The value for RAT
is now auto-guessed as the first RAT column of class character
).
insert bogus levels for rasters that start at values larger than 1 (ever seen one?)
ESA CCI Land Cover (http://maps.elie.ucl.ac.be/CCI/viewer/) data starts from 10. However, it does not have an attribute table attached (you can download it as a csv, .dsr (ENVI), .lyr (ESRI), or .qml (QGIS) file.
Yes, but doesn't contain categories, nor color or attribute tables (I looked at the .tif's)
Hi @edzer,
# remotes::install_github("r-spatial/stars")
library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.7.1, GDAL 2.3.2, PROJ 5.2.0
# data download -----------------------------------------------------------
tf = tempfile(fileext = ".zip")
td = tempdir()
download.file("https://github.com/r-spatial/stars/files/4345255/lanform-example.zip", tf)
unzip(tf, exdir = td)
lf_path = paste0(td, "/lanform.tif")
# data read ---------------------------------------------------------------
lf = read_stars(lf_path)
lf
#> stars object with 2 dimensions and 1 attribute
#> attribute(s), summary of first 1e+05 cells:
#> lanform.tif
#> 9. Scattered low mountains : 369
#> 10.Low mountains : 306
#> 8. High hills : 273
#> 11.Scattered high mountains : 240
#> 16.Tablelands with ver high relief: 8
#> (Other) : 0
#> NA's :98804
#> dimension(s):
#> from to offset delta refsys point values
#> x 1 7360 -1091676 300 PROJCS["unnamed",GEOGCS["... FALSE NULL [x]
#> y 1 3812 -38556.5 -300 PROJCS["unnamed",GEOGCS["... FALSE NULL [y]
plot(lf)
Created on 2020-03-20 by the reprex package (v0.3.0)
# remotes::install_github("r-spatial/stars")
library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.7.1, GDAL 2.3.2, PROJ 5.2.0
# data download -----------------------------------------------------------
tf = tempfile(fileext = ".zip")
td = tempdir()
download.file("https://github.com/r-spatial/stars/files/4358677/landcover-colors.zip", tf)
unzip(tf, exdir = td)
lc_path = paste0(td, "/landcover-colors.tif")
# data read ---------------------------------------------------------------
# R crashes ---------------------------------------------------------------
# lc = read_stars(lc_path)
# lc
# plot(lc)
#
# gdalUtils::gdalinfo(lc_path)
Created on 2020-03-20 by the reprex package (v0.3.0)
Ah yes; we were there: https://github.com/r-spatial/mapview/issues/208 ; This
library(stars)
r = read_stars("ESACCI-LC-L4-LCCS-Map-300m-P1Y-2015-v2.0.7.tif", proxy = TRUE)
plot(r, rgb = read.csv("ESACCI-LC-Legend.csv",sep=";"))
now gives me a plot with the colors, but not with the labels/levels.
@edzer I have been thinking about this topic for some time now. Would you consider adding the following changes to accommodate rasters with attribute tables:
array
dim
), but also the second attribute (let's call it legend
). The second attribute is a data.frame with three columns: value
, label
, and color
. (It could have more columns if it would be better to store colors as RGBA).This way, it would be easier to work on categorical rasters with values not starting from 1, rasters with different values of NA, etc. I also assume it would make it easier to adopt raster colors in other plotting packages, such as tmap or mapview.
We now have support for attributes levels
and colors
, and class
which is factor
. Building on factor
adds the constraint that values are 1-based, and drops the requirement to match values to a table. For arrays that start at values larger than 1 the colors
and levels
vectors can be padded; for those starting lower than 1 there is no chance of getting it done without shifting values. So it is this latter case that your proposal would bring new functionality: having categorical rasters with minimal values below 1 that you want to retain as their original values. ATM I see little urgency to go there.
If someone would take the effort to properly implement (and maintain) your proposal, or something similar, I would be happy to consider a PR.
Thanks for the reply @edzer. I will think about more...
Could you take a look at the issues AD2 and AD3 above (https://github.com/r-spatial/stars/issues/245#issuecomment-601609490)?
I'm slowly starting to grasp the concept of all these tables read through GDAL: they always start at 0, 1, 2, ... and min_value does not refer to the data (cell) minimum, but to the first meaningful (non-NA) category in the layer.
I may have nailed it this time - your examples are very helpful!
Thank you @edzer - I use categorical rasters a lot, and it would be great to have them work smoothly with stars.
I think the previous example now work fine. However, only when in non-proxy mode:
library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.7.1, GDAL 2.3.2, PROJ 5.2.0
# data download -----------------------------------------------------------
tf = tempfile(fileext = ".zip")
td = tempdir()
download.file("https://github.com/r-spatial/stars/files/4345255/lanform-example.zip", tf)
unzip(tf, exdir = td)
lf_path = paste0(td, "/lanform.tif")
# data read ---------------------------------------------------------------
lf = read_stars(lf_path, proxy = TRUE)
#> Error in data + (1 - min_value): non-numeric argument to binary operator
lf
#> Error in eval(expr, envir, enclos): object 'lf' not found
Created on 2020-03-26 by the reprex package (v0.3.0)
Thanks - that was a typo.
@edzer conversion from stars
to raster
works correctly. However, there is an issue with converting raster
to stars
- colors are lost in the process.
This issue has an impact on tmap, as it now uses stars internally to plot raster objects.
library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0
library(raster)
#> Loading required package: sp
# data download -----------------------------------------------------------
tf = tempfile(fileext = ".zip")
td = tempdir()
download.file("https://github.com/r-spatial/stars/files/4345255/lanform-example.zip", tf)
unzip(tf, exdir = td)
lf_path = paste0(td, "/lanform.tif")
# data read ---------------------------------------------------------------
# colors are here
lf = read_stars(lf_path)
plot(lf)
# colors are here
lf_r = raster(lf_path)
plot(lf_r)
# colors are here
lf_r2 = as(lf_r, "Raster")
plot(lf_r2)
# colors are not here anymore
lf_s = st_as_stars(lf_r)
plot(lf_s)
Created on 2020-04-24 by the reprex package (v0.3.0)
Given the number of slots in a RasterLayer's @legend
slot, there will many more surprises of this kind, in case they are used.
Thanks, should work now!
Thank you
Question related to https://github.com/mtennekes/tmap/issues/388: what is the correct way to convert a raster brick with categorical layers? For instance the object
land
fromtmap
:Created on 2020-01-23 by the reprex package (v0.3.0)
I managed to cast
land
to a stars object with the following script:Created on 2020-01-23 by the reprex package (v0.3.0)
Does the output look correct? If so:
plot(s1)
only plots the first layer?st_as_stars.Raster
. However, I'm not sure what the best strategy is: if one of the layers is a factor, do something similar as above, otherwise the original method.