prioritizr / aoh

Create Area of Habitat Data
https://prioritizr.github.io/aoh
9 stars 1 forks source link

empty AOH with very small distributions (single column / line rasters) #48

Closed victorcazalis closed 1 year ago

victorcazalis commented 1 year ago

I noticed that something goes wrong when we are using a distribution so small that the output raster has a single column or line and the AOH map is not created. This is quite rare but happens with species with extremely restricted distributions.

You can find attached a reproducible example which shows that my entry files overlap and are valid and shows that my rasters have a single column. I then map aoh with create_spp_aoh_data() but you will see that it is only made of NAs and cannot be plot with gplot() which plots a very thin line instead of raster cells. I also tried with larger input rasters (using elevation and cci crops of 3*3 cells) and the result was the same. Example_aoh_toosmall.zip

jeffreyhanson commented 1 year ago

Thanks for reporting this! I'll try and get back to you sometime later this week. I just had a look through the zip file - thanks for providing such a great reproducible example! That's super helpful!

jeffreyhanson commented 1 year ago

Just having a quick look, and I wonder if the issue is that the land cover raster does not any suitable habitat values for this species? E.g., looking at the species data, we see that the column full_habitat_code contains the value 1.4. When I filter the crosswalk data to see which land cover pixel values are associated with this habitat code, I get:

> cross %>% filter(code == 1.4)
#>   code value
#> 1  1.4    50
#> 2  1.4    60
#> 3  1.4    61
#> 4  1.4    62
#> 5  1.4    70
#> 6  1.4    71
#> 7  1.4    72
#> 8  1.4    90
#> 9  1.4   160

However, when I look at the values that are present inside the cropped land cover dataset (cci_crop), I see that it does not contain any of the values (shown above) that are associated with habitat code 1.4:

values(cci_crop)
#>      CCI2020_reprojMollweide
#> [1,]                      30
#> [2,]                      10

So, the reason why the output raster doesn't have any non-NA values in it, seems to be because the land cover raster does not contain any suitable habitat classes inside it for this species? How does that sound?

jeffreyhanson commented 1 year ago

Ah, if I manually change the land cover raster to contain a pixel value associated with habitat code 1.4, I now see the issue:

# manually overwrite value in land cover data to contain pixel associated with habitat code 1.4
cci_crop[1] <- 50

# create aoh data
AOH <- create_spp_aoh_data(
  rangeSP, 
  elevation_data = alt_crop,
  habitat_data = cci_crop,
  crosswalk_data = cross,
  output_dir=tempdir(),
  engine="terra")

# import aoh raster and plot
AOH_rast<-rast(AOH$path)
plot(AOH_rast)
victorcazalis commented 1 year ago

Hi Jeff!

Thanks a lot for your reply!

Sorry I did not cover this part in my initial post! I also thought that could be the problem so I tested changing the crosswalk to provide something suitable for that species with cross[5,2]<-30 and it did not change. Also if the cci raster only contain 10 and 30 values but is wider (I tested with a 2x3 raster), the aoh computes correctly and is just not suitable.

Testing the later example, I thought it may be due to the cropping you do at the end of the function. From the plot attached (where the distribution is the grey circle; the resulting AOH has "unsuitable" in purple while the empty cells mean that this is not part of the AOH raster, ie outside the mask), it seems that the function only keeps the cells that overlap sufficiently with the distribution (maybe by >50% or something like that) which is why only 3 cells out of 6 are part of the aoh. Maybe if the distribution only occurs in 1 cell but cover <50% of it, then no cell is kept in the end. What do you think? If so, maybe you could include all cells that overlap with the distribution, or have it as a parameter for crate_spp_aoh_data (a bit like the snap parameter in crop).

jeffreyhanson commented 1 year ago

Yeah, I think what's happening is that if a polygon does not cover the midpoint of a raster cell, then that raster cell is not "considered" to overlap with the polygon. So, if you have a polygon that does not overlap with any of the midpoints of the cells in a raster, then the polygon is not "considered" to overlap with any of the cells in the raster. The fix for this is relatively straightforward. For example, we can change the rasterization method such that if any part of a cell overlaps with a polygon, then that cell is "considered" to overlap with the polygon (e.g., using something like terra::rasterize(..., touches = TRUE), see see https://www.rdocumentation.org/packages/terra/versions/0.8-5/topics/rasterize).

I was wondering what the best way to implement this would be though. For example, should we (i) use this new rasterization method (i.e., a cell is considered to overlap with a species' polygon if any part of the cell is covered by the polygon) for all species? Or (ii) should this only be used for species with a small range size (e.g., below a certain threshold)? Or (iii) for species that would otherwise have no resulting cells in the output raster (e.g., like the example you provided)?

The issue is that there's going to be a trade-off between omission and comission errors. On the one hand, if we use this new rasterization method for widespread species, then this is going to increase their AOH (because cells that are on the edge of the species' range polygon are now going to be included during the rasterization). Additionaly, for super range restricted species that have a much smaller range polygon relative to the raster cells, the AOH raster is going to over-estimate "true" area of habitat (because the output assumes that the species' occupies the full raster cell). On the other hand, if we keep the current rasterization method, then this is going to the AOH raster is going to underestimate the "true" area of habitat for super range restricted species (because the output is going to say there's no habitat at all for the species) -- which is what we see now. How does that sound? Do you know of any documents/reports/materials/papers from the IUCN/experts/scientists that provide recommendations?

jeffreyhanson commented 1 year ago

Just had a quick google to see what the IUCN says about area of occupancy (AOO) since that's effectiverly rasterizing point data (which is somewhat similar to our situation, because we're rasterizing polygon data). Looking at Figure 3 (https://nc.iucnredlist.org/redlist/resources/files/1539098236-Mapping_Standards_Version_1.16_2018.pdf), it seems that even if the the uppermost corner of a raster cell is covered by a point, then the IUCN would say that the cell sohuld be considered as overlapping for AOO calculations. So this would suggest that we go with option (i) and use this new method for all species?

However, it seems GRASS doesn't have the option to specify this new option for rasterizing data (meaning that cells are only "considered" overlapping if a polyogn covers the midpoint of a cell, see https://grass.osgeo.org/grass82/manuals/v.to.rast.html) -- so this could create inconsistencies between the different engines. This also means that Maria's area of habitat maps (https://doi.org/10.1038/s41597-022-01838-w) -- which were created using GRASS -- were also generated under the assumption that a polygon should only be "considered" overlapping with a raster cell if the polygon crosses the midpoint. Since this paper represents standard practices for generating AOH data (to my knowledge), this would suggest that the current method follows standard practices?

jeffreyhanson commented 1 year ago

I suppose one option could be to add the functionality to let users specify what rasterization method they want to use when generating area of habitat data, and leave it up to them to decide? But I would like the defaults for the package to follow best/standard practices, so a user doesn't have to be an expert on AOH data to get good results.

victorcazalis commented 1 year ago

Thanks for all these details!

The most logical to would have been to include all cells that overlap (because 1. it fits the AOO calculation, 2. it wouldn't make a big difference for large range species but would for small range, 3. it's more likely that range is slightly underestimated than overestimated) but I agree it should fit with best practices. Maybe Maria didn't have to think about this as it's a super rare case that a distribution is smaller than a grid cell (I guess it doesn't happen with vertebrates published range map and will only happen when we create on the fly AOH maps as we do on the sRedList platform). Maybe in the end the best would be to use that option only for species that would have no AOH (iii in your message above) so that you don't make a big deal out of it and still fix the bug? Or maybe ask to Stu or someone else that has been following the AOH developments to see what they think the best solution is?

jeffreyhanson commented 1 year ago

Yeah, I feel like (i) would be the most consistent. Yeah, excellent point that (i) probably wouldn't make too big of a difference (assuming reasonable resolution raster data). I'll have a go at implementing this and you know when I've got something ready to play with.

jeffreyhanson commented 1 year ago

@victorcazalis, I've just created a branch that adds a new rasterize_touches parameter to create_spp_aoh_data() and create_spp_frc_data() so that you can specify how species' range data should be treated during rasterization. If TRUE, then any cell partially covered by the species' range is treated as overlapping. If FALSE (the default, to be consistent with Maria's process), then only cells that have their centroid/midpoint covered by the species' range are treated as overlapping. When you get a chance, could you please give it a go and check that it works for you? To install it, please use:

remotes::install_github("prioritizr/aoh@fix-48a")
victorcazalis commented 1 year ago

Thanks Jeffrey for the quick and great fix! It works perfectly :) Now the calculated AOH is a single cell of unsuitable habitat as I was expecting! Are you going to push this change to the main branch or should we keep using this branch?

jeffreyhanson commented 1 year ago

Awesome - thanks for checking! Yeah, now that you've confirmed it works for you too, I'll merge it into the main branch.