ropensci / rnaturalearth

An R package to hold and facilitate interaction with natural earth map data :earth_africa:
http://ropensci.github.io/rnaturalearth/
Other
214 stars 24 forks source link

Virginia Eastern Shore Missing #59

Closed adamkemberling closed 1 year ago

adamkemberling commented 1 year ago

The section of Virginia that is North of the Chesapeake Bay bridge is missing from the Virginia polygon (Fisherman Island Nat. Wildlife refuge to Chincoteague).

For a possible swap in, or for anyone seeking a quick-fix alternative R-package based polygon that has it see {rgeoboundaries}: https://github.com/wmgeolab/rgeoboundaries

PMassicotte commented 1 year ago

Hi @adamkemberling.

The rnaturalearth package only provides an interface to https://www.naturalearthdata.com/. If it is not available there for download, it is normal that it is not available through this package. Please feel free to reopen an issue if you find any discrepancy between the data provided by the package and downloadable content from https://www.naturalearthdata.com/.

adamkemberling commented 1 year ago

Hi @PMassicotte, Sorry it has been so long for me to check this.

Downloading data directly this morning 3/2/2023 from the link you provided. The specific dataset is ne_10m_admin_1_states_provinces

The area in question appears to exist properly at the source for the data. This would suggest that either an old version of the rnaturalearth package (or rnaturalearthdata/rnaturalearthhires) was missing the data or that it was corrupted somewhere along the way.

My current version info: rnaturalearth = 0.1.0 rnaturalearthdata = 0.1.0 rnaturalearthhires = 0.2.0

This is the code I routinely run to access the file with rnaturalearth & sf to replicate and see the missing coastline:

us_poly <- ne_states("united states of america", returnclass = "sf")
ggplot() +
  geom_sf(data = us_poly) +
  coord_sf(xlim = c(-78, -74), ylim = c(34, 41))

And here is a visual of what that displays: image

PMassicotte commented 1 year ago

Can you update rnaturalearth to the latest version and try again?

adamkemberling commented 1 year ago

Problem still occurs with: rnaturalearth = 0.3.2 rnaturalearthdata = 0.1.0 rnaturalearthhires = 0.2.0

mps9506 commented 1 year ago

Can confirn this seems to be a problem in the rnaturalearthhires package. Probably needs to be updated. If you use ne_download() the correct boundaries are shown which implies that Natural Earth has updated the polygons since the last time rnaturalearthhires was updated.

library(rnaturalearth)
library(ggplot2)
library(sf)

us_poly <- ne_states("united states of america", returnclass = "sf")
va <- us_poly[us_poly$postal=="VA",]

ggplot(va) +
  geom_sf()

image

Using ne_download() returns proper boundaries:

states_poly_dl <- ne_download(scale = 'large', type = "states",
                       category = "cultural",
                       returnclass = "sf")
va <- states_poly_dl[states_poly_dl$name == "Virginia",]
ggplot(va) +
  geom_sf()

image

adamkemberling commented 1 year ago

Do you recommend using the ne_download() function to circumvent the need to keep the data packages up to date? This is more of a general data package question I guess and not specific to rnaturalearth. People probably update packages less frequently than fixes/updates occur for them.

EDIT: But that workflow would come with more people pinging the original data source for on-the-fly downloads, and would likely come with some quality of life declines with scripts running more slowly or not running without internet...

Also, want to flag that I appreciate all of y'all's hard work. I am a big advocate of this package for providing consistency and ease-of-use to common map-making needs.

adamkemberling commented 1 year ago

And just another comment for context. I tried updating both rnaturalearthhires and rnaturalearthdata when I updated rnaturalearth today, and there was no indication that I should/could. So without some special commands to install a dev branch I am in a state that others would likely be in.

mps9506 commented 1 year ago

@adamkemberling I think for most interactive data analysis ne_download() is the way to go. That will ensure you get the most recently published data from Natural Earth's repo. When you use functions that rely on rnaturalearthdata and rnaturalearthhires you are getting a data snapshot from whenever the last time the data housed in those respective packages were updated (looks like 5 years ago for hires). For some automated workflows I could see where installing and using the data packages locally would be beneficial, but you are at the mercy of the last time the packages were updated by the dev/maintainer.

PMassicotte commented 1 year ago

Maybe one option could be to cache the results of ne_download(). Maybe with the help of pins.

PMassicotte commented 1 year ago

This has been fixed in ropensci/rnaturalearthhires#8

library(rnaturalearth)
library(ggplot2)

us_poly <- ne_states("united states of america", returnclass = "sf")

ggplot() +
  geom_sf(data = us_poly) +
  coord_sf(xlim = c(-78, -74), ylim = c(34, 41))