ropensci-archive / scrubr

:warning: ARCHIVED :warning: Clean species occurrence records
Other
34 stars 10 forks source link

Filter by biome/ecoregion #30

Closed sckott closed 4 years ago

sckott commented 4 years ago

via https://github.com/ropensci/rgbif/issues/133

sckott commented 4 years ago

@afredstonhermann what sources of biomes/ecoregions would you use/trust?

afredston commented 4 years ago

Great question, I think it depends if the user is more interested in the geographic area of the record (like me) or the actual ecoregion type. Either way I imagine it should be a set of polygons so we can just ID the polygon within which any georeferenced record falls.

There are global databases of ecoregions; I'm most familiar with the marine one (marine ecoregions of the world, MEOWs) but it looks like they've been developed for marine, freshwater, and terrestrial. Just found a stackexchange thread talking about some of the differences between ecoregion mapping approaches.

The above might be really helpful for some people--there must be someone out there who wants to filter for only records in tropical dry forests!--but upon reflection, for my purposes it would really be easiest to just have a higher-level geographical classification than "country". Maybe something like continent on land, and major ocean region (Indian Ocean, Eastern Pacific, Caribbean, Mediterranean, etc.) for the oceans. This also seems maybe useful for other users, who want every record of rhinos on the African continent, or every Mediterranean tuna, etc.

After writing this reply I'm realizing that I can just do this myself as a filter after fetching records, rather than building it into rgbif, but maybe it's sufficiently useful to consider for others!

sckott commented 4 years ago

thanks for these thoughts. i do think it's worth having a pkg do this kind of thing since I have had many people ask about how to do this. I think it belongs here rather than rgbif since rgbif is already complicated enough - and this will require probably leveraging spatial R pkgs (and their system libraries)

Makes sense to offer a number of different 1) data sources (as long as they are public data and avail. in an easy to user format) and 2) methods (e.g, filter by continent (continent = "Africa"), filter by ecoregion (ecoregion_id = "some id"))

a caveat about rgbif: in rgbif, GBIF does allow you to search by a well known text (WKT) area, so you could limit your search for example to an ecoregion. you'd have to make the WKT yourself though. And if the WKT is complex enough (lots of points) we may run into issues with the request being too large for GBIF to handle - however, I do think it will probably be easier to simply filter results by continent/ecoregion after pulling down gbif data

afredston commented 4 years ago

that all makes a lot of sense. let me know if you want me to test anything, and I'll probably be messing with this independently in the next week to come up with some geographic filters.

afredston commented 4 years ago

updating this with a couple other notes: robis::occurrence() can be passed an area ID, a depth range, or a geometry, which I assume can pre-filter data before returning it (didn't test this though). sounds like this feature won't end up being part of rgbif but just wanted to point it out.

someone on Twitter shared the FAO marine boundaries dataset (go to download shape file -> "FAO Statistical areas (Marine ) - GIS data (WFS - SHP)), which has 16 or 17 different levels of spatial aggregation including one that is just four major global oceans. this is what I'm using to crop out points from GBIF right now.

sckott commented 4 years ago

thanks!

I'm not sure what the areaid actually uses within robis. do you know? Unlikely that we'd be able to get GBIF to have an additional filter - but we could at least ask. But i imagine they'd be reluctant to add something, because as you're discussed, there's many different ecoregion sources and at diff spatial scales, which would i imagine be too complex to support for them.

Having a look at the FAO dataset. Where is the documentation for the differnt levels of spatial aggregation?

sckott commented 4 years ago

@afredstonhermann also, how would you and folks in your field select an ecoregion in R? Would you use a FID code, or some other code. Prefer to select a region by name?

sckott commented 4 years ago

started work on the ecoregion branch

afredston commented 4 years ago

I can't speak for the field but my personal preference is always to select by name if there is a list of options available in the package documentation... reduces risk of error and is clearer for future users relative to codes!

I can't find any documentation on the robis areas but I'll look at them and see if I can figure it out. I only found FAO documentation for the fishing areas, which I think are their main unit of spatial analysis, and even that's not very detailed: http://www.fao.org/fishery/docs/STAT/by_FishArea/Fishing_Areas_list.pdf