Open lindsayplatt opened 3 months ago
There are 60,829 COMIDs for the state of MN (this is not including upstream ones):
library(sf)
library(tidyverse)
library(usmap)
mn_sf <- usmap::us_map(include = 'MN') %>%
st_as_sf(coords = c('x', 'y'),
crs = usmap::usmap_crs()) %>%
st_transform(crs = 4326)
# Takes a few minutes to run:
mn_comids <- get_nhdplus(AOI = mn_sf, realization = "flowline")
plot(st_geometry(mn_comids), col='cornflowerblue')
plot(st_geometry(mn_sf), border = 'black', lwd=3, add=TRUE)
I was following along with the suggestion in https://github.com/DOI-USGS/nhdplusTools/issues/354#issuecomment-1741231632 to do a spatial intersection between the national layer of flowlines
to skip the web service in order to get flowlines when you have a large spatial region (states in my case). I have tried a number of different things to download the national layer of flowlines but I have landed on the conclusion that the file might be corrupted? All I need is a geospatial layer that has COMIDs but downloading other subsets from the EPA website (by state or by HUC) does not give a COMID column and I cannot figure out how to map back to COMIDs.
Capturing what I have done so far by describing the paths I've taken and the dead-ends I've hit.
nhdplusTools::download_nhdplusv2("my_out_folder/")
. This downloads the file but then I get the following error when I try to unzip (manually or via unzip()
, and I have installed the 7zip software and can unzip the global 7z just fine)NHDPlusV21_NationalData_Seamless_Geodatabase_Lower48_07.7z
directly from the website but hit the same issues when trying to unzip. nhdplusid
but it doesn't seem to actually match any COMIDs when I try using it to get other attributes):library(sf)
local_zip <- 'nhdplus_epasnapshot2022_mn_gpkg.zip'
unzip(local_zip, exdir = '.')
local_file <- 'nhdplus_epasnapshot2022_mn.gpkg'
mn_flines <- st_read(local_file, layer = 'nhdflowline_mn')
plot(st_zm(st_geometry(mn_flines)))
head(mn_flines)
>head(mn_flines)
Simple feature collection with 6 features and 21 fields
Geometry type: LINESTRING
Dimension: XYM
Bounding box: xmin: -92.82686 ymin: 45.9163 xmax: -92.65892 ymax: 45.947
m_range: mmin: 0 mmax: 100
Geodetic CRS: NAD83
statecode permanent_identifier fdate resolution gnis_id gnis_name lengthkm reachcode flowdir
1 MN 86443715 2012-03-18 03:51:17 2 <NA> <NA> 0.1531888 07030003001588 1
2 MN 69540075 2012-03-03 13:27:50 2 <NA> <NA> 0.1105886 07030001000965 1
3 MN 69539963 2012-03-03 13:28:10 2 00662324 Saint Croix River 0.1099746 07030001003342 1
4 MN 69539963 2012-03-03 13:28:10 2 00662324 Saint Croix River 0.0437405 07030001003342 1
5 MN 69540039 2012-03-03 13:26:49 2 <NA> <NA> 1.4289885 07030001003355 1
6 MN 86443675 2012-03-18 03:51:54 2 <NA> <NA> 0.3315046 07030003001586 1
wbarea_permanent_identifier ftype fcode mainpath innetwork visibilityfilter enabled vpuid nhdplusid fmeasure tmeasure
1 <NA> 460 46006 0 1 0 1 0703 2.20012e+13 31.58160 100.00000
2 120008455 558 55800 0 1 0 1 0703 2.20012e+13 44.08867 100.00000
3 120008455 558 55800 0 1 0 1 0703 2.20012e+13 0.00000 28.69629
4 120008455 558 55800 0 1 0 1 0703 2.20012e+13 88.58632 100.00000
5 <NA> 460 46003 0 1 0 1 0703 2.20012e+13 0.00000 100.00000
6 120008454 558 55800 0 1 0 1 0703 2.20012e+13 5.20570 54.49420
globalid shape
1 {0d7e9a9f-e3c7-11ed-b63b-7b27fe1ffa82} LINESTRING M (-92.76149 45....
2 {0d7e9aa0-e3c7-11ed-b63b-7b27fe1ffa82} LINESTRING M (-92.6699 45.9...
3 {0d7e9aa1-e3c7-11ed-b63b-7b27fe1ffa82} LINESTRING M (-92.6614 45.9...
4 {0d7e9aa2-e3c7-11ed-b63b-7b27fe1ffa82} LINESTRING M (-92.65892 45....
5 {0d7e9aa3-e3c7-11ed-b63b-7b27fe1ffa82} LINESTRING M (-92.82686 45....
6 {0d7e9aa4-e3c7-11ed-b63b-7b27fe1ffa82} LINESTRING M (-92.79176 45....
[NHDPlusV21_MS_07_NHDSnapshotFGDB_08.7z](https://dmap-data-commons-ow.s3.amazonaws.com/NHDPlusV21/Data/NHDPlusMS/NHDPlus07/NHDPlusV21_MS_07_NHDSnapshotFGDB_08.7z)
and then ran the following (but had the same issue where there were no COMIDs):library(sf)
local_file <- 'NHDPlusV21_MS_07_NHDSnapshotFGDB_08/NHDPlusMS/NHDPlus07/NHDSnapshot/NHDSnapshot.gdb'
huc07_flines <- st_read(local_file, layer = 'NHDFlowline')
plot(st_zm(st_geometry(huc07_flines)))
head(huc07_flines)
> head(huc07_flines)
Simple feature collection with 6 features and 12 fields Geometry type: MULTILINESTRING Dimension: XYZM Bounding box: xmin: -93.71928 ymin: 41.39813 xmax: -85.92239 ymax: 47.26174 z_range: zmin: 0 zmax: 0 m_range: mmin: 0 mmax: 100 Geodetic CRS: NAD83 Permanent_Identifier FDate Resolution GNIS_ID GNIS_Name LengthKM ReachCode FlowDir WBArea_Permanent_Identifier 1 13282276 1999-09-13 3 5.305 04030201006220 0 0 2 13432527 1999-10-23 3 2.311 04050001016735 0 0 3 13432529 1999-10-23 3 1.651 04050001016736 0 0 4 13432531 1999-10-23 3 5.741 04050001016737 0 0 5 22326519 2009-05-10 3 1629903 Mississippi River 1.369 07010101000002 1 22324893 6 22326685 2009-05-10 3 1629903 Mississippi River 1.238 07010101000005 1 22324943 FType FCode Shape_Length Shape 1 460 46003 0.05692686 MULTILINESTRING ZM ((-89.07... 2 460 46003 0.02263203 MULTILINESTRING ZM ((-85.92... 3 336 33600 0.01588836 MULTILINESTRING ZM ((-85.92... 4 336 33600 0.06417983 MULTILINESTRING ZM ((-85.92... 5 558 55800 0.01599238 MULTILINESTRING ZM ((-93.63... 6 558 55800 0.01508686 MULTILINESTRING ZM ((-93.71...
I asked ChatGPT about the 7z error I was getting and based on its reply, I tried unzipping the 7z to a new folder rather than excepting the default. That worked ................ I have spent all morning attempting to get passed this.
The error message "Error 0x80070003: The system cannot find the path specified" when unzipping a 7z file typically indicates that there is an issue with the file path, either because the path is too long, contains invalid characters, or the specified directory does not exist
library(nhdplusTools)
library(sf)
# Same download method
nhdplus_gdb <- download_nhdplusv2("./6_PredictClass/tmp_nhdplus/")
# I TRIED UNZIPPING TO A DIFFERENT LOCATION AND IT IS OK WITH THAT.
unzip('6_PredictClass/tmp_nhdplus/NHDPlusV21_NationalData_Seamless_Geodatabase_Lower48_07.7z',
exdir = '6_PredictClass/out_nhdplus')
# No errors so now I can load the flowlines!
nat_gdb <- '6_PredictClass/out_nhdplus/NHDPlusNationalData/NHDPlusV21_National_Seamless_Flattened_Lower48.gdb'
nat_flines_sf <- st_read(nat_gdb, layer = 'NHDFlowline_Network')
The new, faster way to get COMIDs for a large space (not querying web services each time)
# Find MN COMIDs in a faster way!
library(geos)
library(nhdplusTools)
library(sf)
library(tidyverse)
# 1. Download National flowlines layer
nhdplus_gdb <- download_nhdplusv2("./6_PredictClass/tmp_nhdplus/")
# 2. Unzip flowlines GDB *to a new folder*
unzip('6_PredictClass/tmp_nhdplus/NHDPlusV21_NationalData_Seamless_Geodatabase_Lower48_07.7z',
exdir = '6_PredictClass/out_nhdplus')
# 3. Load the national layer and an sf object of MN
nat_gdb <- '6_PredictClass/out_nhdplus/NHDPlusNationalData/NHDPlusV21_National_Seamless_Flattened_Lower48.gdb'
nat_flines_sf <- st_read(nat_gdb, layer = 'NHDFlowline_Network') %>% st_zm() # Drops vertical attrs
mn_sf <- usmap::us_map(include = 'MN') %>%
st_as_sf(coords = c('x', 'y'),
crs = usmap::usmap_crs()) %>%
st_transform(crs = st_crs(nat_flines_sf))
# 4. Use geos to intersect flowlines with the state
mn_flines <- nat_flines_sf[which(geos::geos_intersects(nat_flines_sf, mn_sf)),]
# Now just plot for visualization purposes
ggplot() +
geom_sf(data = mn_sf, fill = 'white', color = 'black') +
geom_sf(data = mn_flines, color = 'cornflowerblue', size=1) +
theme_void()
For MN, I need to query COMIDs upstream of each COMID individually. This takes ~ 1 second per COMID. Working on implementing this into our pipeline now in order to predict forward but for Minnesota alone this could mean 17 hrs of querying time ... (60,829 COMIDs / 3600 seconds/hr = 16.89 hrs)
Trying to see if there is a different way to do this that is faster, what I really need is a catchment shape per COMID that shows all upstream COMIDs. Checking out some hydroloom
features ... but thinking that just clicking "run" may be the best course of action.
Note that you need to run the code chunk from the previous comment to run this:
mn_upstream <- mn_comids %>%
rename(nhd_comid = comid) %>%
split(.$nhd_comid) %>%
map(~identify_upstream_comids(comid_in = .x$nhd_comid)) %>%
bind_rows()
mn_upstream_flowlines <- get_nhdplus(comid = unique(mn_upstream$nhd_comid_upstream), realization = "flowline")
plot(st_geometry(mn_upstream_flowlines), col='cornflowerblue')
plot(st_geometry(p6_state_sf[[1]]), border = 'black', lwd=3, add=TRUE)
Start with MN to test some of the issues with expanding beyond our state borders. Then, go up to the full region.