lindsayplatt / episodic-river-salinization-model

Modeling code in a targets pipeline for understanding characteristics of rivers that experience episodic salinization from winter road salting events.
1 stars 0 forks source link

Scale prediction of episodic or not to full region #21

Open lindsayplatt opened 3 months ago

lindsayplatt commented 3 months ago

Start with MN to test some of the issues with expanding beyond our state borders. Then, go up to the full region.

lindsayplatt commented 2 months ago

There are 60,829 COMIDs for the state of MN (this is not including upstream ones):

image

library(sf)
library(tidyverse)
library(usmap)

mn_sf <- usmap::us_map(include = 'MN') %>% 
               st_as_sf(coords = c('x', 'y'), 
                        crs = usmap::usmap_crs()) %>% 
               st_transform(crs = 4326)

# Takes a few minutes to run:
mn_comids <- get_nhdplus(AOI = mn_sf, realization = "flowline")

plot(st_geometry(mn_comids), col='cornflowerblue')
plot(st_geometry(mn_sf), border = 'black', lwd=3, add=TRUE)
lindsayplatt commented 2 months ago

I was following along with the suggestion in https://github.com/DOI-USGS/nhdplusTools/issues/354#issuecomment-1741231632 to do a spatial intersection between the national layer of flowlines to skip the web service in order to get flowlines when you have a large spatial region (states in my case). I have tried a number of different things to download the national layer of flowlines but I have landed on the conclusion that the file might be corrupted? All I need is a geospatial layer that has COMIDs but downloading other subsets from the EPA website (by state or by HUC) does not give a COMID column and I cannot figure out how to map back to COMIDs.

Capturing what I have done so far by describing the paths I've taken and the dead-ends I've hit.

  1. nhdplusTools::download_nhdplusv2("my_out_folder/"). This downloads the file but then I get the following error when I try to unzip (manually or via unzip(), and I have installed the 7zip software and can unzip the global 7z just fine)

image

  1. I downloaded NHDPlusV21_NationalData_Seamless_Geodatabase_Lower48_07.7z directly from the website but hit the same issues when trying to unzip.
  2. I downloaded a specific state but unfortunately the resulting file has flowlines but no COMIDs (and I have tried a bunch of different things to get COMIDs from the columns present). I manually downloaded this file, then ran the following (there is a column called nhdplusid but it doesn't seem to actually match any COMIDs when I try using it to get other attributes):
library(sf)

local_zip <- 'nhdplus_epasnapshot2022_mn_gpkg.zip'
unzip(local_zip, exdir = '.')
local_file <- 'nhdplus_epasnapshot2022_mn.gpkg'
mn_flines <- st_read(local_file, layer = 'nhdflowline_mn')
plot(st_zm(st_geometry(mn_flines)))
head(mn_flines)
>head(mn_flines)
Simple feature collection with 6 features and 21 fields
Geometry type: LINESTRING
Dimension:     XYM
Bounding box:  xmin: -92.82686 ymin: 45.9163 xmax: -92.65892 ymax: 45.947
m_range:       mmin: 0 mmax: 100
Geodetic CRS:  NAD83
  statecode permanent_identifier               fdate resolution  gnis_id         gnis_name  lengthkm      reachcode flowdir
1        MN             86443715 2012-03-18 03:51:17          2     <NA>              <NA> 0.1531888 07030003001588       1
2        MN             69540075 2012-03-03 13:27:50          2     <NA>              <NA> 0.1105886 07030001000965       1
3        MN             69539963 2012-03-03 13:28:10          2 00662324 Saint Croix River 0.1099746 07030001003342       1
4        MN             69539963 2012-03-03 13:28:10          2 00662324 Saint Croix River 0.0437405 07030001003342       1
5        MN             69540039 2012-03-03 13:26:49          2     <NA>              <NA> 1.4289885 07030001003355       1
6        MN             86443675 2012-03-18 03:51:54          2     <NA>              <NA> 0.3315046 07030003001586       1
  wbarea_permanent_identifier ftype fcode mainpath innetwork visibilityfilter enabled vpuid   nhdplusid fmeasure  tmeasure
1                        <NA>   460 46006        0         1                0       1  0703 2.20012e+13 31.58160 100.00000
2                   120008455   558 55800        0         1                0       1  0703 2.20012e+13 44.08867 100.00000
3                   120008455   558 55800        0         1                0       1  0703 2.20012e+13  0.00000  28.69629
4                   120008455   558 55800        0         1                0       1  0703 2.20012e+13 88.58632 100.00000
5                        <NA>   460 46003        0         1                0       1  0703 2.20012e+13  0.00000 100.00000
6                   120008454   558 55800        0         1                0       1  0703 2.20012e+13  5.20570  54.49420
                                globalid                          shape
1 {0d7e9a9f-e3c7-11ed-b63b-7b27fe1ffa82} LINESTRING M (-92.76149 45....
2 {0d7e9aa0-e3c7-11ed-b63b-7b27fe1ffa82} LINESTRING M (-92.6699 45.9...
3 {0d7e9aa1-e3c7-11ed-b63b-7b27fe1ffa82} LINESTRING M (-92.6614 45.9...
4 {0d7e9aa2-e3c7-11ed-b63b-7b27fe1ffa82} LINESTRING M (-92.65892 45....
5 {0d7e9aa3-e3c7-11ed-b63b-7b27fe1ffa82} LINESTRING M (-92.82686 45....
6 {0d7e9aa4-e3c7-11ed-b63b-7b27fe1ffa82} LINESTRING M (-92.79176 45....

image

  1. I also tried downloading flowlines through the HUC method by manually downloading and manually unzipping the file [NHDPlusV21_MS_07_NHDSnapshotFGDB_08.7z](https://dmap-data-commons-ow.s3.amazonaws.com/NHDPlusV21/Data/NHDPlusMS/NHDPlus07/NHDPlusV21_MS_07_NHDSnapshotFGDB_08.7z) and then ran the following (but had the same issue where there were no COMIDs):
library(sf)

local_file <- 'NHDPlusV21_MS_07_NHDSnapshotFGDB_08/NHDPlusMS/NHDPlus07/NHDSnapshot/NHDSnapshot.gdb'
huc07_flines <- st_read(local_file, layer = 'NHDFlowline')
plot(st_zm(st_geometry(huc07_flines)))
head(huc07_flines)
> head(huc07_flines)
Simple feature collection with 6 features and 12 fields Geometry type: MULTILINESTRING Dimension:     XYZM Bounding box:  xmin: -93.71928 ymin: 41.39813 xmax: -85.92239 ymax: 47.26174 z_range:       zmin: 0 zmax: 0 m_range:       mmin: 0 mmax: 100 Geodetic CRS:  NAD83   Permanent_Identifier      FDate Resolution GNIS_ID         GNIS_Name LengthKM      ReachCode FlowDir WBArea_Permanent_Identifier 1             13282276 1999-09-13          3                              5.305 04030201006220       0                           0 2             13432527 1999-10-23          3                              2.311 04050001016735       0                           0 3             13432529 1999-10-23          3                              1.651 04050001016736       0                           0 4             13432531 1999-10-23          3                              5.741 04050001016737       0                           0 5             22326519 2009-05-10          3 1629903 Mississippi River    1.369 07010101000002       1                    22324893 6             22326685 2009-05-10          3 1629903 Mississippi River    1.238 07010101000005       1                    22324943   FType FCode Shape_Length                          Shape 1   460 46003   0.05692686 MULTILINESTRING ZM ((-89.07... 2   460 46003   0.02263203 MULTILINESTRING ZM ((-85.92... 3   336 33600   0.01588836 MULTILINESTRING ZM ((-85.92... 4   336 33600   0.06417983 MULTILINESTRING ZM ((-85.92... 5   558 55800   0.01599238 MULTILINESTRING ZM ((-93.63... 6   558 55800   0.01508686 MULTILINESTRING ZM ((-93.71...

image

lindsayplatt commented 2 months ago

I asked ChatGPT about the 7z error I was getting and based on its reply, I tried unzipping the 7z to a new folder rather than excepting the default. That worked ................ I have spent all morning attempting to get passed this.

The error message "Error 0x80070003: The system cannot find the path specified" when unzipping a 7z file typically indicates that there is an issue with the file path, either because the path is too long, contains invalid characters, or the specified directory does not exist

library(nhdplusTools)
library(sf)

# Same download method
nhdplus_gdb <- download_nhdplusv2("./6_PredictClass/tmp_nhdplus/")

# I TRIED UNZIPPING TO A DIFFERENT LOCATION AND IT IS OK WITH THAT.
unzip('6_PredictClass/tmp_nhdplus/NHDPlusV21_NationalData_Seamless_Geodatabase_Lower48_07.7z',
      exdir = '6_PredictClass/out_nhdplus')

# No errors so now I can load the flowlines!
nat_gdb <- '6_PredictClass/out_nhdplus/NHDPlusNationalData/NHDPlusV21_National_Seamless_Flattened_Lower48.gdb'
nat_flines_sf <- st_read(nat_gdb, layer = 'NHDFlowline_Network')
lindsayplatt commented 2 months ago

The new, faster way to get COMIDs for a large space (not querying web services each time)

# Find MN COMIDs in a faster way!

library(geos)
library(nhdplusTools)
library(sf)
library(tidyverse)

# 1. Download National flowlines layer
nhdplus_gdb <- download_nhdplusv2("./6_PredictClass/tmp_nhdplus/")

# 2. Unzip flowlines GDB *to a new folder*
unzip('6_PredictClass/tmp_nhdplus/NHDPlusV21_NationalData_Seamless_Geodatabase_Lower48_07.7z',
      exdir = '6_PredictClass/out_nhdplus')

# 3. Load the national layer and an sf object of MN
nat_gdb <- '6_PredictClass/out_nhdplus/NHDPlusNationalData/NHDPlusV21_National_Seamless_Flattened_Lower48.gdb'
nat_flines_sf <- st_read(nat_gdb, layer = 'NHDFlowline_Network') %>% st_zm() # Drops vertical attrs

mn_sf <- usmap::us_map(include = 'MN') %>% 
  st_as_sf(coords = c('x', 'y'), 
           crs = usmap::usmap_crs()) %>% 
  st_transform(crs = st_crs(nat_flines_sf))

# 4. Use geos to intersect flowlines with the state
mn_flines <- nat_flines_sf[which(geos::geos_intersects(nat_flines_sf, mn_sf)),]

# Now just plot for visualization purposes
ggplot() +
  geom_sf(data = mn_sf, fill = 'white', color = 'black') +
  geom_sf(data = mn_flines, color = 'cornflowerblue', size=1) +
  theme_void()

image

lindsayplatt commented 2 months ago

For MN, I need to query COMIDs upstream of each COMID individually. This takes ~ 1 second per COMID. Working on implementing this into our pipeline now in order to predict forward but for Minnesota alone this could mean 17 hrs of querying time ... (60,829 COMIDs / 3600 seconds/hr = 16.89 hrs)

Trying to see if there is a different way to do this that is faster, what I really need is a catchment shape per COMID that shows all upstream COMIDs. Checking out some hydroloom features ... but thinking that just clicking "run" may be the best course of action.

Note that you need to run the code chunk from the previous comment to run this:

mn_upstream <- mn_comids %>% 
    rename(nhd_comid = comid) %>% 
    split(.$nhd_comid) %>% 
    map(~identify_upstream_comids(comid_in = .x$nhd_comid)) %>% 
    bind_rows()

 mn_upstream_flowlines <- get_nhdplus(comid = unique(mn_upstream$nhd_comid_upstream), realization = "flowline")

plot(st_geometry(mn_upstream_flowlines), col='cornflowerblue')
plot(st_geometry(p6_state_sf[[1]]), border = 'black', lwd=3, add=TRUE)