scworland / restore-2018

scripts for predicting streamflow characteristics in ungaged basins for RESTORE
4 stars 2 forks source link

Sites removed from official list given to Asquith circa fall 2016 #9

Open ghost opened 6 years ago

ghost commented 6 years ago

08025360 removed, specific tails of a dam, not full river flow.

These 19 sites have at least one negative flow. RRK and WHA decided 12/01/2017 to remove from project those sites with >1% negative. 08041780 is removed as it is located at a salt water barrier and operated for special purposes in Texas. All remaining stations, we substituted NA for the negative flow. This provides computational protection with say allowing 7 days missing in a year to still compute an FDC and protection in case of spurious negatives might have entered the database.

02300021 POR=3743 NEG=1263 PCT=33.74 02300082 POR=3837 NEG=23 PCT=0.6 02301719 POR=4741 NEG=582 PCT=12.28 02310663 POR=4493 NEG=826 PCT=18.38 02312700 POR=18982 NEG=31 PCT=0.16 02313700 POR=15726 NEG=1590 PCT=10.11 02322800 POR=5085 NEG=6 PCT=0.12 02323592 POR=5512 NEG=3 PCT=0.05 02453500 POR=7119 NEG=28 PCT=0.39 02462951 POR=13631 NEG=13 PCT=0.1 07288955 POR=6574 NEG=35 PCT=0.53 07348000 POR=19357 NEG=1 PCT=0.01 07353000 POR=8858 NEG=83 PCT=0.94 07369000 POR=28308 NEG=133 PCT=0.47 07372200 POR=19776 NEG=17 PCT=0.09 07380120 POR=8180 NEG=103 PCT=1.26 07382500 POR=21644 NEG=62 PCT=0.29 08012150 POR=8413 NEG=2078 PCT=24.7 08041780 POR=4814 NEG=36 PCT=0.75

---------------------- WHAT FOLLOWS ARE NEGATIVES PERMITTED ----------- 02300082 POR=3837 NEG=23 PCT=0.6 02312700 POR=18982 NEG=31 PCT=0.16 02322800 POR=5085 NEG=6 PCT=0.12 02323592 POR=5512 NEG=3 PCT=0.05 02453500 POR=7119 NEG=28 PCT=0.39 02462951 POR=13631 NEG=13 PCT=0.1 07288955 POR=6574 NEG=35 PCT=0.53 07348000 POR=19357 NEG=1 PCT=0.01 07353000 POR=8858 NEG=83 PCT=0.94 07369000 POR=28308 NEG=133 PCT=0.47 07372200 POR=19776 NEG=17 PCT=0.09 07382500 POR=21644 NEG=62 PCT=0.29 --------------------- THEN ALL WERE CONVERTED TO NA -------------------

From ASQUITH's DV.RData file:

write.table(ls(DV), file="RESTOREsites.txt", row.names=FALSE, quote=FALSE)
length(ls(DV))
[1] 1379

will be the official list of sites for DV processing and beyond. 1,379 sites.

scworland commented 6 years ago

Will-thanks! Github is markdown enabled so "#" indicates a header. I edited your issue to removed the "#"'s in front of your site numbers and add syntax highlights for the R code.

Could you replace the csv site list with the new list and push everything to github? make sure it is in the same format. Don't forget to run git pull prior to pushing the change!

ghost commented 6 years ago

Thanks on the pound signs, they are there to allow me simple embedment of notes in my building script. I have updated the CSV site list and the leading zero is there, remember to read it in as a character in the future.

scworland commented 6 years ago

@wasquith-usgs The fdc_lmr_pplo2010-15.feather file has 133 sites that are not in the decade1950plus_site_list.csv file. Here are several examples: 02294161, 02294655, and 02298488. This code should give you the sites:

fdc10 <- read_feather("data/gage/fdc_lmr_pplo2010-15.feather")
sites <- read_csv("data/decade1950plus_site_list.csv")
missing_sites <- fdc10$site[(!fdc10$site %in% sites$site_no)]

any ideas?

ghost commented 6 years ago

My one off changes to akqdecay to get 2010-15 as a decade were pretty clear and I did spot checking.

Consider that permitting 5 years will catch many short record sites and at least in Texas very real gage expansion in modern times. I spot checked your 02294161 began record only in 2005.

Can you do some random spot checks throughout the list of 133? I am home today working through a kid's dissertation here at TTU.

A loop on DV.Rdata for these gages and grabbing the first record and consulting YEAR would quickly get an overall feeling.

-wha (iPhone)

On Jan 3, 2018, at 10:39 AM, Scott Worland notifications@github.com wrote:

Reopened #9 https://github.com/scworland-usgs/restore-2018/issues/9.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/scworland-usgs/restore-2018/issues/9#event-1408113820, or mute the thread https://github.com/notifications/unsubscribe-auth/AE5JQEeiT0kEGGJxsShu6xgmbJRSXDyqks5tG61UgaJpZM4Qyplb .

scworland commented 6 years ago

Here is the histogram:

fdc10 <- read_feather("data/gage/fdc_lmr_pplo2010-15.feather")
sites <- read_csv("data/decade1950plus_site_list.csv")
missing_sites <- fdc10$site[(!fdc10$site %in% sites$site_no)]

load("data/dvs/DV.RData")
dv_list <- as.list(DV)

years <- data.frame(sites=missing_sites)
for (i in 1:length(missing_sites)){
  years$first_year[i] <- dv_list[missing_sites[i]][[1]]$year[1]
}

ggplot(years) + geom_histogram(aes(first_year), color="white")

image

ghost commented 6 years ago

If I remove all sites with a starting year of 2001 or later, I get 53 sites with "old data" yet only have a complete decade in "2010-15". I have checked 02323000, 07035000, and 08073700, which have like 10k+ DVs, by inspection of a text file of all DVs that I have. They have discontinuous record gaps and hence not passing my 7 days per year missing (translation 70 days total in 10 years).

I am not concerned that we have a unique set of some sites only showing up in fdc_lmr_pplo2010-15.feather. It can be explained and my building code looks to have made things okay.

fdc10 <- read_feather("fdc_lmr_pplo2010-15.feather") fdc00 <- read_feather("fdc_lmr_pplo.feather") missing_sites <- fdc10$site[(!fdc10$site %in% fdc00$site)] for(site in missing_sites) { tmp <- get(site, envir=DV); n <- length(tmp$year); if(tmp$year[1] >= 2001) next; print(c(site, tmp$year[1], tmp$year[n], n)); i <- i + 1 } The i is 53 sites.

William H. Asquith, Research Hydrologist U.S. Geological Survey, Science Building MS-1053 Texas Tech University, Lubbock, Texas 79409 806-742-3129 work; 806-392-4148 cell

On Wed, Jan 3, 2018 at 10:39 AM, Scott Worland notifications@github.com wrote:

The fdc_lmr_pplo2010-15.feather file has 133 sites that are not in the decade1950plus_site_list.csv file. Here are several examples: 02294161, 02294655, and 02298488. This code should give you the sites:

fdc10 <- read_feather("data/gage/fdc_lmr_pplo2010-15.feather")sites <- read_csv("data/decade1950plus_site_list.csv")missing_sites <- fdc10$site[(!fdc10$site %in% sites$site_no)]

any ideas?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/scworland-usgs/restore-2018/issues/9#issuecomment-355059661, or mute the thread https://github.com/notifications/unsubscribe-auth/AE5JQEeiT0kEGGJxsShu6xgmbJRSXDyqks5tG61UgaJpZM4Qyplb .

scworland commented 6 years ago

I am also 100% okay with having sites with less than 7 years in the 2005-2010 "decade"

scworland commented 6 years ago

@wasquith we started with 1,379 sites and have culled that down to 1,030. Were the 349 sites dropped due to provisional data, data not within 1950-2010, and negative flows?

ghost commented 6 years ago

Do we need to have a conference call on this topic?

William H. Asquith, Research Hydrologist U.S. Geological Survey, Science Building MS-1053 Texas Tech University, Lubbock, Texas 79409 806-742-3129 work; 806-392-4148 cell

On Wed, Jan 17, 2018 at 5:37 PM, Scott Worland notifications@github.com wrote:

@wasquith https://github.com/wasquith we started with 1,379 sites and have culled that down to 1,030. Were the 349 sites dropped due to provisional data and negative flows?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scworland-usgs/restore-2018/issues/9#issuecomment-358486364, or mute the thread https://github.com/notifications/unsubscribe-auth/AE5JQJP0XVl_GDyTxzyu4-7Fa1GXXx-Lks5tLoQygaJpZM4Qyplb .

scworland commented 6 years ago

Maybe at some point. Just trying to get our methods in manuscript format