Open wdwatkins opened 5 years ago
@mhines-usgs Note there might be some similar holes in the documentation to the WQP report. Also, this PR should be merged before running the report again: https://code.usgs.gov/water/NWIS_Analytics/merge_requests/22 . You can merge it, but might be good to take a look at it to see what's happening in the code base. The database currently has data through September 26th, so there isn't too much new data to add.
is this something that I would need to update to reflect the data on the server I see? https://code.usgs.gov/water/NWIS_Analytics/blob/3c79a16813a72312b90cdbe6931de4b77cb16bc7/lib/cfg/general_config.yaml
[mhines@yeti-login20 analytics] ls -alhrt
total 12K
drwxr-xr-x. 4 wwatkins users 47 Feb 21 2019 monetdb_fy17
drwxr-xr-x. 3 wwatkins users 17 Jul 1 12:44 fy19_dl
drwxr-xr-x. 2 wwatkins users 4.0K Aug 15 19:06 wqp
drwxrwsrwx. 9 jzwart iidd 4.0K Sep 11 15:58 ..
drwxr-xr-x. 4 wwatkins users 28 Sep 28 19:37 fy19
-rw-r--r--. 1 wwatkins users 54 Sep 30 03:44 .gdk_lock
drwxrwxrwx. 7 wwatkins iidd 95 Sep 30 16:50 .
drwxr-xr-x. 4 wwatkins users 47 Sep 30 16:50 monetdb_fy18
[mhines@yeti-login20 analytics]
so are you using this file for two different steps and commenting out what part you don't need it to do? just confused about how much commenting is going on in here https://code.usgs.gov/water/NWIS_Analytics/blob/3c79a16813a72312b90cdbe6931de4b77cb16bc7/src/add_to_db.slurm
same question for https://code.usgs.gov/water/NWIS_Analytics/blob/3c79a16813a72312b90cdbe6931de4b77cb16bc7/remake.yml#L54
nevermind, readme explains ✔️
so are you using this file for two different steps and commenting out what part you don't need it to do? just confused about how much commenting is going on in here https://code.usgs.gov/water/NWIS_Analytics/blob/3c79a16813a72312b90cdbe6931de4b77cb16bc7/src/add_to_db.slurm
Sorry, those comment parts shouldn't be checked in. I had to partially rerun that job since it ran out of time once. Will fix.
The general_config
file can be left as-is. That is a vestige of when we were using separate databases for each fiscal year so they would fit on a laptop. The fy18 database contains all of FY19 as well, and you can add new data to it. That is something we will deal with when we eventually move away from MonetDB (it is now unsupported).
Getting a lot of these:
rm: cannot remove `/cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data/no_ip/nwis.waterservices.usgs.gov-access_log-2019-09-21.gz': Permission denied
rm: cannot remove `/cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data/no_ip/nwis.waterservices.usgs.gov-access_log-2019-09-22.gz': Permission denied
rm: cannot remove `/cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data/no_ip/nwis.waterservices.usgs.gov-access_log-2019-09-23.gz': Permission denied
rm: cannot remove `/cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data/no_ip/nwis.waterservices.usgs.gov-access_log-2019-09-24.gz': Permission denied
rm: cannot remove `/cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data/no_ip/nwis.waterservices.usgs.gov-access_log-2019-09-25.gz': Permission denied
rm: cannot remove `/cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data/no_ip/nwis.waterservices.usgs.gov-access_log-2019-09-26.gz': Permission denied
rm: cannot remove `/cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data/no_ip/waterservices.usgs.gov-access_log-2019-09-20.gz': Permission denied
rm: cannot remove `/cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data/no_ip/waterservices.usgs.gov-access_log-2019-09-21.gz': Permission denied
rm: cannot remove `/cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data/no_ip/waterservices.usgs.g
then lots of these
47: In download.file(url = waterservices_url, destfile = waterservices_path) :
URL ftp://natalog.er.usgs.gov/pub/httpd/2019/waterservices.usgs.gov-access_log-2019-10-08.gz: cannot open destfile '/cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data/waterservices.usgs.gov-access_log-2019-10-08.gz', reason 'Permission denied'
48: In download.file(url = waterservices_url, destfile = waterservices_path) :
download had nonzero exit status
49: In download.file(url = nwis.waterservices_url, destfile = nwis.waterservices_path) :
URL ftp://natalog.er.usgs.gov/pub/httpd/2019/nwis.waterservices.usgs.gov-access_log-2019-10-09.gz: cannot open destfile '/cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data/nwis.waterservices.usgs.gov-access_log-2019-10-09.gz', reason 'Permission denied'
50: In download.file(url = nwis.waterservices_url, destfile = nwis.waterservices_path) :
download had nonzero exit status
then a bunch of
+ for file in '"$search_path"$search_pattern'
++ basename /cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data/nwis.waterservices.usgs.gov-access_log-2019-07-01.gz
+ file_only=nwis.waterservices.usgs.gov-access_log-2019-07-01.gz
+ gzip
src/remove_ips.sh: line 11: /cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data//no_ip/nwis.waterservices.usgs.gov-access_log-2019-07-01.gz: Permission denied
+ awk -F '[' '{first=$1; $1=""; print $0}'
+ zcat /cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data/nwis.waterservices.usgs.gov-access_log-2019-07-01.gz
+ [[ 1 -eq 0 ]]
which all seems like i may not have write access to that /cxfs/... dir?
Yeah, file permission issues for sure. I'll take a look
Ok, try now, I changed everything to 777 permissions
perfect, thanks!
That seems to sort of work, but is still throwing some permission errors like it's trying to go further back in time and handle files I don't have permissions on before the date range I requested (like maybe every file in that fy19/data/folder)? see the change from 9/19 (permission denied) to 9/20 (done successfully) suddenly
++ basename /cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data/waterservices.usgs.gov-access_log-2019-09-18.gz
+ file_only=waterservices.usgs.gov-access_log-2019-09-18.gz
+ gzip
+ awk -F '[' '{first=$1; $1=""; print $0}'
+ zcat /cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data/waterservices.usgs.gov-access_log-2019-09-18.gz
src/remove_ips.sh: line 11: /cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data//no_ip/waterservices.usgs.gov-access_log-2019-09-18.gz: Permission denied
+ [[ 1 -eq 0 ]]
+ for file in '"$search_path"$search_pattern'
++ basename /cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data/waterservices.usgs.gov-access_log-2019-09-19.gz
+ file_only=waterservices.usgs.gov-access_log-2019-09-19.gz
+ gzip
+ awk -F '[' '{first=$1; $1=""; print $0}'
+ zcat /cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data/waterservices.usgs.gov-access_log-2019-09-19.gz
src/remove_ips.sh: line 11: /cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data//no_ip/waterservices.usgs.gov-access_log-2019-09-19.gz: Permission denied
+ [[ 1 -eq 0 ]]
+ for file in '"$search_path"$search_pattern'
++ basename /cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data/waterservices.usgs.gov-access_log-2019-09-20.gz
+ file_only=waterservices.usgs.gov-access_log-2019-09-20.gz
+ gzip
+ awk -F '[' '{first=$1; $1=""; print $0}'
+ zcat /cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data/waterservices.usgs.gov-access_log-2019-09-20.gz
+ [[ 0 -eq 0 ]]
+ echo /cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data/waterservices.usgs.gov-access_log-2019-09-20.gz done successfully
/cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data/waterservices.usgs.gov-access_log-2019-09-20.gz done successfully
+ for file in '"$search_path"$search_pattern'
++ basename /cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data/waterservices.usgs.gov-access_log-2019-09-21.gz
+ file_only=waterservices.usgs.gov-access_log-2019-09-21.gz
+ gzip
+ awk -F '[' '{first=$1; $1=""; print $0}'
+ zcat /cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data/waterservices.usgs.gov-access_log-2019-09-21.gz
+ [[ 0 -eq 0 ]]
+ echo /cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data/waterservices.usgs.gov-access_log-2019-09-21.gz done successfully
/cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data/waterservices.usgs.gov-access_log-2019-09-21.gz done successfully
+ for file in '"$search_path"$search_pattern'
Later in the job it ultimately errors with another error about a lock on the monet DB?
Error in monetdb_embedded_startup(embedded, !getOption("monetdb.debug.embedded", :
Failed to initialize embedded MonetDB !FATAL: GDKlockHome: Database lock '/cxfs/projects/usgs/water/iidd/data-sci/analytics/monetdb_fy18/.gdk_lock' denied
Calls: add_files_to_monet ... dbConnect -> dbConnect -> .local -> monetdb_embedded_startup
Execution halted
[mhines@yeti-login20 shellLog]
is there a way to remove that lock? I've only modified the dates in the slurm script. that's the only place I need to adjust dates for this step, correct? mine looks like:
Rscript -e 'source("src/download.R"); download_natalog_date_range(start="2019-09-27", end="2019-10-20", local_data_folder = "/cxfs/projects/usgs/water/iidd/data-sci/analytics/fy19/data"); warnings()'
let me know if you want the whole file.
hmm maybe this data
dir is still unwriteable? def the .gdk_lock is
[mhines@yeti-login20 analytics] ls -alhrt
total 12K
drwxrwxrwx. 4 wwatkins users 47 Feb 21 2019 monetdb_fy17
drwxrwxrwx. 3 wwatkins users 17 Jul 1 12:44 fy19_dl
drwxrwxrwx. 2 wwatkins users 4.0K Aug 15 19:06 wqp
drwxrwsrwx. 9 jzwart iidd 4.0K Sep 11 15:58 ..
drwxrwxrwx. 4 wwatkins users 28 Sep 28 19:37 fy19
-rw-r--r--. 1 wwatkins users 54 Sep 30 03:44 .gdk_lock
drwxrwxrwx. 7 wwatkins iidd 95 Sep 30 16:50 .
drwxrwxrwx. 4 wwatkins users 47 Sep 30 16:50 monetdb_fy18
[mhines@yeti-login20 analytics] cd fy19
[mhines@yeti-login20 fy19] ls -alhrt
total 32K
drwxrwxrwx. 4 wwatkins users 28 Sep 28 19:37 .
drwxr-xr-x. 2 wwatkins users 12K Sep 28 19:38 save
drwxr-xr-x. 3 wwatkins users 12K Sep 30 15:22 data
drwxrwxrwx. 7 wwatkins iidd 95 Sep 30 16:50 ..
[mhines@yeti-login20 fy19]
Yeah, looks like, not sure why it didn't take last time. Those files from before 9-26 should have been deleted by the initial part of that script, but I guess hung around because of the permissions. I ran chmod more explicitly this time, try again? I think made some wrong assumptions about how */*/*
would behave
still running at 6+ minutes so that's good :) thanks.
just catching this thread up, it ended up failing last night with more file permission issues - we're working with Yeti folks to get us all in the same group and reworked the slurm script some to only move on to next steps in the script if it succeeds with each preceding step. https://code.usgs.gov/water/NWIS_Analytics/merge_requests/23/diffs
Finished loading data into monetdb through 10/21
Files to add to monet: nwis.waterservices.usgs.gov-access_log-2019-09-30.gznwis.waterservices.usgs.gov-access_log-2019-10-01.gznwis.waterservices.usgs.gov-access_log-2019-10-02.gznwis.waterservices.usgs.gov-access_log-2019-10-03.gznwis.waterservices.usgs.gov-access_log-2019-10-04.gznwis.waterservices.usgs.gov-access_log-2019-10-05.gznwis.waterservices.usgs.gov-access_log-2019-10-06.gznwis.waterservices.usgs.gov-access_log-2019-10-07.gznwis.waterservices.usgs.gov-access_log-2019-10-08.gznwis.waterservices.usgs.gov-access_log-2019-10-09.gznwis.waterservices.usgs.gov-access_log-2019-10-10.gznwis.waterservices.usgs.gov-access_log-2019-10-11.gznwis.waterservices.usgs.gov-access_log-2019-10-12.gznwis.waterservices.usgs.gov-access_log-2019-10-13.gznwis.waterservices.usgs.gov-access_log-2019-10-14.gznwis.waterservices.usgs.gov-access_log-2019-10-15.gznwis.waterservices.usgs.gov-access_log-2019-10-16.gznwis.waterservices.usgs.gov-access_log-2019-10-17.gznwis.waterservices.usgs.gov-access_log-2019-10-18.gznwis.waterservices.usgs.gov-access_log-2019-10-19.gznwis.waterservices.usgs.gov-access_log-2019-10-20.gznwis.waterservices.usgs.gov-access_log-2019-10-21.gzwaterservices.usgs.gov-access_log-2019-09-30.gzwaterservices.usgs.gov-access_log-2019-10-01.gzwaterservices.usgs.gov-access_log-2019-10-02.gzwaterservices.usgs.gov-access_log-2019-10-03.gzwaterservices.usgs.gov-access_log-2019-10-04.gzwaterservices.usgs.gov-access_log-2019-10-05.gzwaterservices.usgs.gov-access_log-2019-10-06.gzwaterservices.usgs.gov-access_log-2019-10-07.gzwaterservices.usgs.gov-access_log-2019-10-08.gzwaterservices.usgs.gov-access_log-2019-10-09.gzwaterservices.usgs.gov-access_log-2019-10-10.gzwaterservices.usgs.gov-access_log-2019-10-11.gzwaterservices.usgs.gov-access_log-2019-10-12.gzwaterservices.usgs.gov-access_log-2019-10-13.gzwaterservices.usgs.gov-access_log-2019-10-14.gzwaterservices.usgs.gov-access_log-2019-10-15.gzwaterservices.usgs.gov-access_log-2019-10-16.gzwaterservices.usgs.gov-access_log-2019-10-17.gzwaterservices.usgs.gov-access_log-2019-10-18.gzwaterservices.usgs.gov-access_log-2019-10-19.gzwaterservices.usgs.gov-access_log-2019-10-20.gzwaterservices.usgs.gov-access_log-2019-10-21.gz
There were 50 or more warnings (use warnings() to see the first 50)
SUCCESS: parsed and added to monetdb
now moving on to report building part...
not sure if this is helping.. :confused: I ran the job w/ the function looking like this w/ comments and memory outputs
#' @param fiscal_year character passed to get_fiscal_year_dates, either
#' "current" or "previous"
get_gwsip_nwisweb_stats <- function(outind, fiscal_year) {
#browser()
c(con, dplyr_con) %<-% connect_to_monet()
message("connected to monet")
fy_range <- get_fiscal_year_dates(fy = fiscal_year)
message("FY date range: ", fy_range)
message("memory: ", pryr::mem_used())
total_requests <- get_total_requests_data(dplyr_con, fy_range)
message("total_requests: ", length(total_requests))
message("memory: ", pryr::mem_used())
flow_requests <- get_flow_requests_data(dplyr_con, fy_range)
message("flow_requests: ", length(flow_requests))
message("memory: ", pryr::mem_used())
gw_requests <- get_groundwater_requests_data(dplyr_con, fy_range)
message("gw_requests: ", length(gw_requests))
message("memory: ", pryr::mem_used())
all_data <- list(total_requests = total_requests,
flow_requests = flow_requests,
gw_requests = gw_requests,
timestamp = Sys.Date())
message("all_data: ", length(all_data))
message("memory: ", pryr::mem_used())
saveRDS(all_data, as_data_file(outind))
message("Saved to RDS file")
message("memory: ", pryr::mem_used())
s3_put(remote_ind = outind)
message("put into s3")
message("memory: ", pryr::mem_used())
dbDisconnect(con, shutdown = TRUE)
}
output:
[mhines@yeti-login20 NWIS_Analytics] tail -f shellLog/slurm-4776929_1.out -n 10000
USGS Support Package: https://owi.usgs.gov/R/packages.html#support
Loading required package: sp
Checking rgeos availability: TRUE
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Starting build at 2019-10-25 08:46:20
< MAKE > process/gwsip_nwisweb_metrics.rds.ind
[ OK ] current_or_previous_fy
[ OK ] database_date_range
[ BUILD ] process/gwsip_nwisweb_metrics.rds.ind | get_gwsip_nwisweb_stats(...
[ READ ] | # loading packages
connected to monet
FY date range: 2018-10-012019-09-30
memory: 103636792
total_requests: 2
memory: 105184832
flow_requests: 2
memory: 105766560
gw_requests: 2
memory: 105765904
all_data: 4
memory: 105766376
Saved to RDS file
memory: 105777744
put into s3
memory: 108183288
Finished build at 2019-10-25 10:01:07
Build completed in 74.79 minutes
Warning messages:
1: `new_overscope()` is deprecated as of rlang 0.2.0.
Please use `new_data_mask()` instead.
This warning is displayed once per session.
2: `overscope_eval_next()` is deprecated as of rlang 0.2.0.
Please use `eval_tidy()` with a data mask instead.
This warning is displayed once per session.
3: `overscope_clean()` is deprecated as of rlang 0.2.0.
This warning is displayed once per session.
USGS Support Package: https://owi.usgs.gov/R/packages.html#support
Loading required package: sp
Checking rgeos availability: TRUE
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Starting build at 2019-10-25 10:01:14
< MAKE > all
[ OK ] fiscal_year
[ OK ] current_or_previous_fy
[ OK ] database_date_range
[ OK ] date_range_caption
[ OK ] missing_days
[ OK ] water_events
[ OK ] process/service_summary_data.rds.ind
[ OK ] process/service_summary_data.rds
[ OK ] service_summary_plot
[ BUILD ] process/service_timeseries.rds.ind | get_service_timeseries_d...
[ READ ] | # loading packages
Restoring previous version of process/service_timeseries.rds.ind
Error in if (res@env$delivered >= res@env$info$rows) { :
missing value where TRUE/FALSE needed
Calls: scmake ... tryCatchList -> dbFetch -> dbFetch -> .handleSimpleError -> h
In addition: Warning message:
In monetdb_embedded_query(conn@connenv$conn, statement, execute, :
NAs introduced by coercion to integer range
Execution halted
srun: error: n3-97: task 0: Exited with exit code 1
running interactively it zooms past the first scmake line and then doesn't give much useful output, are there more places I can add comments to get a sense of where that memory issue is happening?
> library(scipiper); scmake("process/gwsip_nwisweb_metrics.rds.ind")
USGS Support Package: https://owi.usgs.gov/R/packages.html#support
Loading required package: sp
Checking rgeos availability: TRUE
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Starting build at 2019-10-25 10:42:36
< MAKE > process/gwsip_nwisweb_metrics.rds.ind
[ OK ] current_or_previous_fy
[ OK ] database_date_range
[ OK ] process/gwsip_nwisweb_metrics.rds.ind
Finished build at 2019-10-25 10:42:36
Build completed in 0.00 minutes
> scmake("process/gwsip_nwisweb_metrics.rds.ind")
Starting build at 2019-10-25 10:44:05
< MAKE > process/gwsip_nwisweb_metrics.rds.ind
[ OK ] current_or_previous_fy
[ OK ] database_date_range
[ OK ] process/gwsip_nwisweb_metrics.rds.ind
Finished build at 2019-10-25 10:44:05
Build completed in 0.00 minutes
> scmake("process/gwsip_nwisweb_metrics.rds.ind")
Starting build at 2019-10-25 10:45:38
< MAKE > process/gwsip_nwisweb_metrics.rds.ind
[ OK ] current_or_previous_fy
[ OK ] database_date_range
[ OK ] process/gwsip_nwisweb_metrics.rds.ind
Finished build at 2019-10-25 10:45:38
Build completed in 0.00 minutes
> library(scipiper); scmake()
Starting build at 2019-10-25 10:45:52
< MAKE > all
[ OK ] fiscal_year
[ OK ] current_or_previous_fy
[ OK ] database_date_range
[ OK ] date_range_caption
[ OK ] missing_days
[ OK ] water_events
[ OK ] process/service_summary_data.rds.ind
[ OK ] process/service_summary_data.rds
[ OK ] service_summary_plot
[ BUILD ] process/service_timeseries.rds.ind | get_service_timeseries_d...
[ READ ] | # loading packages
Restoring previous version of process/service_timeseries.rds.ind
Error
no problem connecting to the fy17 db
> source('src/functions.R')
> library(MonetDBLite)
library(dplyr)
library(DBI)
library(zeallot)
> library(yaml)
> library(dplyr)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
> library(DBI)
> library(zeallot)
> c(con, dplyr_con) %<-% connect_to_monet('fy17')
> dbDisconnect(con, shutdown = TRUE)
>
Narrowing in..
I added a bunch of comments into the service_stats.R function that it is failing in (get_service_timeseries_df) and I'm seeing that it goes through fy17 just fine, but on the second loop in the for loop for fy18 data, it doesn't seem like lines 47 & 48 succeed
updated src w/ comments like this
get_service_timeseries_df <- function(outind, fiscal_year_1, fiscal_year_2) {
message("in get_service_timeseries_df function")
both_years <- tibble()
message("both years type: ", typeof(both_years))
message("memory: ", pryr::mem_used())
for(fy in c(fiscal_year_1, fiscal_year_2)) {
c(con, dplyr_con) %<-% connect_to_monet(fiscal_year = fy)
message("connected to monet in the for loop: ", fy)
message("memory: ", pryr::mem_used())
raw_timestamps <- dplyr_con %>%
select(request_date, service, bytes) %>% collect()
message("raw_timestamps ", length(raw_timestamps))
dates <- raw_timestamps %>% mutate(request_date = as.Date(request_date),
bytes = as.numeric(bytes)) %>%
group_by(request_date, service) %>%
summarize(n = n(), bytes = sum(bytes, na.rm = TRUE)) %>%
rename(date=request_date)
message("got dates")
message("memory: ", pryr::mem_used())
#Oct 1st of the next FY snuck into databases
if(month(max(dates$date)) == 10) {
dates <- dates %>% filter(date != max(dates$date))
}
both_years <- bind_rows(both_years, dates)
message("both years data gathered")
message("memory: ", pryr::mem_used())
dbDisconnect(con, shutdown = TRUE)
message("disconnected from monet")
}
saveRDS(both_years, file = as_data_file(outind))
message ("saved to rds file")
s3_put(outind)
message("put in s3")
}
tailing the log
[mhines@yeti-login20 NWIS_Analytics] tail -f shellLog/slurm-4776993_1.out
[ OK ] process/service_summary_data.rds
[ OK ] service_summary_plot
[ BUILD ] process/service_timeseries.rds.ind | get_service_timeseries_d...
[ READ ] | # loading packages
in get_service_timeseries_df function
both years type: list
memory: 101954360
connected to monet in the for loop: fy17
memory: 103647376
raw_timestamps3
got dates
memory: 11182274928
both years data gathered
memory: 11182357200
disconnected from monet
connected to monet in the for loop: fy18
memory: 11182490696
Restoring previous version of process/service_timeseries.rds.ind
Error in if (res@env$delivered >= res@env$info$rows) { :
missing value where TRUE/FALSE needed
Calls: scmake ... tryCatchList -> dbFetch -> dbFetch -> .handleSimpleError -> h
In addition: Warning message:
In monetdb_embedded_query(conn@connenv$conn, statement, execute, :
NAs introduced by coercion to integer range
Execution halted
srun: error: n3-87: task 0: Exited with exit code 1
so, now I am attempting to just run those failing lines in the R session after connecting to the fy18 db...
and can confirm that is where the error is coming from..
> c(con, dplyr_con) %<-% connect_to_monet(fiscal_year = 'fy18')
> raw_timestamps <- dplyr_con %>%
+ select(request_date, service, bytes) %>% collect()
Error in if (res@env$delivered >= res@env$info$rows) { :
missing value where TRUE/FALSE needed
In addition: Warning message:
In monetdb_embedded_query(conn@connenv$conn, statement, execute, :
NAs introduced by coercion to integer range
>
I'm also getting this error, I just had to delete the .gdk_lock file to connect. However, the database is not completely borked — I can at least take heads
of a certain number of rows, so starting with the October 2017 data. I wonder if something went wrong with the new data you added? Can you go back and check the logs from that job and make sure there aren't any errors?
I will try to see if I can isolate the rows that cause errors. I think there is a way to roll back the database also, we could look into that.
here was the successful run -- there are errors but they are the permission errors we have been seeing, it doesn't seem like anything went wrong with the load? slurm-4774952_1.txt
Yeah, those are just file permissions, and the warnings I think are expected, based on times I've run the add_to_db job
# worked
> raw_timestamps <- dplyr_con %>%
+ select(request_date) %>% filter(request_date < '2019-09-25') %>% collect()
#worked
raw_timestamps <- dplyr_con %>% select(request_date) %>% filter(request_date > '2019-09-25') %>% collect()
#doesn't work
raw_timestamps <- dplyr_con %>% select(request_date) %>% collect()
Error in if (res@env$delivered >= res@env$info$rows) { :
missing value where TRUE/FALSE needed
In addition: Warning message:
In monetdb_embedded_query(conn@connenv$conn, statement, execute, :
NAs introduced by coercion to integer range
🤔 This seems to suggest it's not an issue of bad rows or something. I'm gonna try to dig into the the database call to see where this actually happens. In the mean time, we could try to run the rest of the report, with this section commented out, and see if this error happens elsewhere too. Maybe it is just a flukey thing with this particular command? 🤞
I adjusted the get_service_timeseries_df function in the service_stats.R file to read similar to one of your working adjusted lines above
raw_timestamps <- dplyr_con %>%
select(request_date, service, bytes) %>% filter(request_date > '2017-10-25') %>% collect()
and it gets past that part, yay!
but then it errors again on a future step and I'm wondering if it's because I've been using your R library :( (in my user's home dir, i have a .Rprofile file that points at your workspace)
mhines@yeti-login20 NWIS_Analytics] tail -f shellLog/slurm-4788626_1.out
intersect, setdiff, setequal, union
Starting build at 2019-11-05 15:56:40
< MAKE > process/gwsip_nwisweb_metrics.rds.ind
[ OK ] current_or_previous_fy
[ OK ] database_date_range
[ OK ] process/gwsip_nwisweb_metrics.rds.ind
Finished build at 2019-11-05 15:56:40
Build completed in 0.00 minutes
USGS Support Package: https://owi.usgs.gov/R/packages.html#support
Loading required package: sp
Checking rgeos availability: TRUE
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Starting build at 2019-11-05 15:56:49
< MAKE > all
[ OK ] fiscal_year
[ OK ] current_or_previous_fy
[ OK ] database_date_range
[ OK ] date_range_caption
[ OK ] missing_days
[ OK ] water_events
[ OK ] process/service_summary_data.rds.ind
[ OK ] process/service_summary_data.rds
[ OK ] service_summary_plot
[ BUILD ] process/service_timeseries.rds.ind | get_service_timeseries_d...
[ READ ] | # loading packages
Error: package or namespace load failed for ‘tidyverse’:
.onAttach failed in attachNamespace() for 'tidyverse', details:
call: rbind(info, getNamespaceInfo(env, "S3methods"))
error: number of columns of matrices must match (see arg 2)
Execution halted
srun: error: UV00000437-P001: task 0: Exited with exit code 1
Ugh, yeah, I goofed up my library when I installed some R 3.6 packages without having my library path set up to change with different versions 😞 I can try reinstalling some libraries, could end up being blocked by the library problems I've been having on 3.6 though
Yeah, :( darn libraries and versions and workspaces
@mhines-usgs I think I've repaired the library 🤞
Thanks, I still see
[mhines@yeti-login20 NWIS_Analytics] tail -f shellLog/slurm-4814771_1.out
[ OK ] service_summary_plot
[ BUILD ] process/service_timeseries.rds.ind | get_service_timeseries_d...
[ READ ] | # loading packages
Error: Some packages are missing: tidyverse
Install with:
remake::install_missing_packages("remake.yml")
or:
install.packages("tidyverse")
Execution halted
srun: error: n3-97: task 0: Exited with exit code 1
do I need to change the path to your library or is there another path i should be using in my .Rprofile now?
Sorry, I apparently fixed all the the other packages that were corrupted, but forgot I had deleted tidyverse and didn't reinstall it. I have installed it now. That path is still good.
Got a bit further, but still hit this error.. not sure why yet.
[mhines@yeti-login20 NWIS_Analytics] tail -f shellLog/slurm-4814776_1.out
intersect, setdiff, setequal, union
Starting build at 2019-11-12 15:40:56
< MAKE > process/gwsip_nwisweb_metrics.rds.ind
[ OK ] current_or_previous_fy
[ OK ] database_date_range
[ OK ] process/gwsip_nwisweb_metrics.rds.ind
Finished build at 2019-11-12 15:40:56
Build completed in 0.00 minutes
USGS Support Package: https://owi.usgs.gov/R/packages.html#support
Loading required package: sp
Checking rgeos availability: TRUE
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Starting build at 2019-11-12 15:41:00
< MAKE > all
[ OK ] fiscal_year
[ OK ] current_or_previous_fy
[ OK ] database_date_range
[ OK ] date_range_caption
[ OK ] missing_days
[ OK ] water_events
[ OK ] process/service_summary_data.rds.ind
[ OK ] process/service_summary_data.rds
[ OK ] service_summary_plot
[ BUILD ] process/service_timeseries.rds.ind | get_service_timeseries_d...
[ READ ] | # loading packages
in get_service_timeseries_df function
both years type: list
memory: 102107296
connected to monet in the for loop: fy17
memory: 103785352
raw_timestamps3
got dates
memory: 105476752
Restoring previous version of process/service_timeseries.rds.ind
Error in if (month(max(dates$date)) == 10) { :
missing value where TRUE/FALSE needed
Calls: scmake ... get_service_timeseries_df -> .handleSimpleError -> h
In addition: Warning messages:
1: `new_overscope()` is deprecated as of rlang 0.2.0.
Please use `new_data_mask()` instead.
This warning is displayed once per session.
2: `overscope_eval_next()` is deprecated as of rlang 0.2.0.
Please use `eval_tidy()` with a data mask instead.
This warning is displayed once per session.
3: `overscope_clean()` is deprecated as of rlang 0.2.0.
This warning is displayed once per session.
4: In max.default(numeric(0), na.rm = FALSE) :
no non-missing arguments to max; returning -Inf
Execution halted
srun: error: n3-97: task 0: Exited with exit code 1
^C
so (month(max(dates$date)) is NA.. and dates$date seems to be empty..
looking back at raw_timestamps, those also seem to be a problem after I modified the code here: https://github.com/usgs-makerspace/makerspace-sandbox/issues/153#issuecomment-550062603
now my code with all these gross messages/logs looks like this:
#service timeseries
#for two fiscal years
get_service_timeseries_df <- function(outind, fiscal_year_1, fiscal_year_2) {
message("in get_service_timeseries_df function")
both_years <- tibble()
message("both years type: ", typeof(both_years))
message("memory: ", pryr::mem_used())
for(fy in c(fiscal_year_1, fiscal_year_2)) {
c(con, dplyr_con) %<-% connect_to_monet(fiscal_year = fy)
message("connected to monet in the for loop: ", fy)
message("memory: ", pryr::mem_used())
raw_timestamps <- dplyr_con %>%
select(request_date, service, bytes) %>% filter(request_date > '2017-10-25') %>% collect()
message("raw_timestamps", length(raw_timestamps))
message("raw_timestamps: ", raw_timestamps)
dates <- raw_timestamps %>% mutate(request_date = as.Date(request_date),
bytes = as.numeric(bytes)) %>%
group_by(request_date, service) %>%
summarize(n = n(), bytes = sum(bytes, na.rm = TRUE)) %>%
rename(date=request_date)
message("got dates")
message("memory: ", pryr::mem_used())
#Oct 1st of the next FY snuck into databases
message("length(dates): ", length(dates))
message("month(max(dates$date)): ", month(max(dates$date)))
message("dates$date: ", dates$date)
message("dates: ", dates)
if(month(max(dates$date)) == 10) {
dates <- dates %>% filter(date != max(dates$date))
}
both_years <- bind_rows(both_years, dates)
message("both years data gathered")
message("memory: ", pryr::mem_used())
dbDisconnect(con, shutdown = TRUE)
message("disconnected from monet")
}
saveRDS(both_years, file = as_data_file(outind))
message ("saved to rds file")
s3_put(outind)
message("put in s3")
}
and logs like:
[mhines@yeti-login20 NWIS_Analytics] tail -f shellLog/slurm-4814783_1.out
intersect, setdiff, setequal, union
Starting build at 2019-11-12 16:36:34
< MAKE > process/gwsip_nwisweb_metrics.rds.ind
[ OK ] current_or_previous_fy
[ OK ] database_date_range
[ OK ] process/gwsip_nwisweb_metrics.rds.ind
Finished build at 2019-11-12 16:36:34
Build completed in 0.00 minutes
USGS Support Package: https://owi.usgs.gov/R/packages.html#support
Loading required package: sp
Checking rgeos availability: TRUE
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Starting build at 2019-11-12 16:36:37
< MAKE > all
[ OK ] fiscal_year
[ OK ] current_or_previous_fy
[ OK ] database_date_range
[ OK ] date_range_caption
[ OK ] missing_days
[ OK ] water_events
[ OK ] process/service_summary_data.rds.ind
[ OK ] process/service_summary_data.rds
[ OK ] service_summary_plot
[ BUILD ] process/service_timeseries.rds.ind | get_service_timeseries_d...
[ READ ] | # loading packages
in get_service_timeseries_df function
both years type: list
memory: 102115568
connected to monet in the for loop: fy17
memory: 103793624
raw_timestamps3
raw_timestamps: numeric(0)character(0)integer(0)
got dates
memory: 105486208
length(dates): 4
month(max(dates$date)): NA
dates$date:
dates: numeric(0)character(0)integer(0)numeric(0)
Restoring previous version of process/service_timeseries.rds.ind
Error in if (month(max(dates$date)) == 10) { :
missing value where TRUE/FALSE needed
Calls: scmake ... get_service_timeseries_df -> .handleSimpleError -> h
In addition: Warning messages:
1: `new_overscope()` is deprecated as of rlang 0.2.0.
Please use `new_data_mask()` instead.
This warning is displayed once per session.
2: `overscope_eval_next()` is deprecated as of rlang 0.2.0.
Please use `eval_tidy()` with a data mask instead.
This warning is displayed once per session.
3: `overscope_clean()` is deprecated as of rlang 0.2.0.
This warning is displayed once per session.
4: In max.default(numeric(0), na.rm = FALSE) :
no non-missing arguments to max; returning -Inf
5: In max.default(numeric(0), na.rm = FALSE) :
no non-missing arguments to max; returning -Inf
Execution halted
srun: error: n3-97: task 0: Exited with exit code 1
so, i guess that raw_timestamp change causes things not to error, but perhaps not giving us the data we need?
Oh, that data filter will exclude any FY17 data... I had forgotten that there was pre-FY18 data in the other DB. Let's talk about this a bit tomorrow and decide how to proceed. The immediate thing would be to add logic so that filter only applies against the FY18 database, but we're kind of getting into really clunky solutions here 😕
Ah, duh. Ok, I could probably do that to make it work for now.
Ok — would be good to know if any other problem turn up.
I forgot to follow up on this, but I tried this simple if else to handle the different selection needs, and it timed out after 3 hours. Not sure if this is just incorrect and caused a problem or if it really needs more time? it seemed to get stuck in the statement dealing with FY17?
for(fy in c(fiscal_year_1, fiscal_year_2)) {
c(con, dplyr_con) %<-% connect_to_monet(fiscal_year = fy)
message("connected to monet in the for loop: ", fy)
message("memory: ", pryr::mem_used())
if (fy=="fy18") {
raw_timestamps <- dplyr_con %>%
select(request_date, service, bytes) %>% filter(request_date > '2017-10-25') %>% collect()
}
else {
raw_timestamps <- dplyr_con %>%
select(request_date, service, bytes) %>% collect()
}
message("raw_timestamps", length(raw_timestamps))
message("raw_timestamps: ", raw_timestamps)
dates <- raw_timestamps %>% mutate(request_date = as.Date(request_date),
bytes = as.numeric(bytes)) %>%
[mhines@yeti-login20 NWIS_Analytics] cat shellLog/slurm-4815018_1.out
USGS Support Package: https://owi.usgs.gov/R/packages.html#support
Loading required package: sp
Checking rgeos availability: TRUE
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Starting build at 2019-11-13 09:27:58
< MAKE > process/gwsip_nwisweb_metrics.rds.ind
[ OK ] current_or_previous_fy
[ OK ] database_date_range
[ OK ] process/gwsip_nwisweb_metrics.rds.ind
Finished build at 2019-11-13 09:27:58
Build completed in 0.00 minutes
USGS Support Package: https://owi.usgs.gov/R/packages.html#support
Loading required package: sp
Checking rgeos availability: TRUE
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Starting build at 2019-11-13 09:28:02
< MAKE > all
[ OK ] fiscal_year
[ OK ] current_or_previous_fy
[ OK ] database_date_range
[ OK ] date_range_caption
[ OK ] missing_days
[ OK ] water_events
[ OK ] process/service_summary_data.rds.ind
[ OK ] process/service_summary_data.rds
[ OK ] service_summary_plot
[ BUILD ] process/service_timeseries.rds.ind | get_service_timeseries_d...
[ READ ] | # loading packages
in get_service_timeseries_df function
both years type: list
memory: 102121648
connected to monet in the for loop: fy17
memory: 103799704
raw_timestamps3
slurmstepd: error: *** JOB 4815018 ON n3-97 CANCELLED AT 2019-11-13T12:28:14 DUE TO TIME LIMIT ***
slurmstepd: error: *** STEP 4815018.1 ON n3-97 CANCELLED AT 2019-11-13T12:28:14 DUE TO TIME LIMIT ***
srun: got SIGCONT
srun: forcing job termination
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: n3-97: task 0: Terminated
srun: Force Terminated job step 4815018.1
[mhines@yeti-login20 NWIS_Analytics]
Hmm, so it got to the first print statement message("raw_timestamps", length(raw_timestamps))
, but not the second message("raw_timestamps: ", raw_timestamps)
. I wonder if it was just taking forever trying to write an entire year of timestamps in the message log?
Maybe change the second one to just head(raw_timestamps)
so it doesn't try to print the whole thing. Also, you probably want nrows
instead of length
for the first one — length
actually gives the number of columns when used on a data.frame.
hit a memory issue, bumped up to 300gb from 120gb... rerunning
argh, and then it hit another lock:
[mhines@yeti-login20 NWIS_Analytics] cat shellLog/slurm-4829258_1.out
USGS Support Package: https://owi.usgs.gov/R/packages.html#support
Loading required package: sp
Checking rgeos availability: TRUE
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Starting build at 2019-11-18 15:44:59
< MAKE > process/gwsip_nwisweb_metrics.rds.ind
[ OK ] current_or_previous_fy
[ OK ] database_date_range
[ OK ] process/gwsip_nwisweb_metrics.rds.ind
Finished build at 2019-11-18 15:44:59
Build completed in 0.00 minutes
USGS Support Package: https://owi.usgs.gov/R/packages.html#support
Loading required package: sp
Checking rgeos availability: TRUE
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Starting build at 2019-11-18 15:45:06
< MAKE > all
[ OK ] fiscal_year
[ OK ] current_or_previous_fy
[ OK ] database_date_range
[ OK ] date_range_caption
[ OK ] missing_days
[ OK ] water_events
[ OK ] process/service_summary_data.rds.ind
[ OK ] process/service_summary_data.rds
[ OK ] service_summary_plot
[ BUILD ] process/service_timeseries.rds.ind | get_service_timeseries_d...
[ READ ] | # loading packages
in get_service_timeseries_df function
both years type: list
memory: 102178616
Restoring previous version of process/service_timeseries.rds.ind
Error in monetdb_embedded_startup(embedded, !getOption("monetdb.debug.embedded", :
Failed to initialize embedded MonetDB !FATAL: GDKlockHome: Database lock '/cxfs/projects/usgs/water/iidd/data-sci/analytics/monetdb_fy17/.gdk_lock' denied
Calls: scmake ... dbConnect -> dbConnect -> .local -> monetdb_embedded_startup
Execution halted
srun: error: UV00000395-P002: task 0: Exited with exit code 1
[mhines@yeti-login20 NWIS_Analytics]
Oh, that is probably because the job didn't close the connection nicely last time since it ran out of memory. If you just delete '/cxfs/projects/usgs/water/iidd/data-sci/analytics/monetdb_fy17/.gdk_lock'
it should work. Might want to do that for the fy18 database too, it is probably in the same state.
The .gdk_lock files are just text files that record a log on/off time, you can cat
them.
went pretty far but failed further in the process, will try to figure out why in a little while
[mhines@yeti-login20 NWIS_Analytics] tail -f shellLog/slurm-4829259_1.out -n 10000
USGS Support Package: https://owi.usgs.gov/R/packages.html#support
Loading required package: sp
Checking rgeos availability: TRUE
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Starting build at 2019-11-18 16:01:00
< MAKE > process/gwsip_nwisweb_metrics.rds.ind
[ OK ] current_or_previous_fy
[ OK ] database_date_range
[ OK ] process/gwsip_nwisweb_metrics.rds.ind
Finished build at 2019-11-18 16:01:00
Build completed in 0.00 minutes
USGS Support Package: https://owi.usgs.gov/R/packages.html#support
Loading required package: sp
Checking rgeos availability: TRUE
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Starting build at 2019-11-18 16:01:07
< MAKE > all
[ OK ] fiscal_year
[ OK ] current_or_previous_fy
[ OK ] database_date_range
[ OK ] date_range_caption
[ OK ] missing_days
[ OK ] water_events
[ OK ] process/service_summary_data.rds.ind
[ OK ] process/service_summary_data.rds
[ OK ] service_summary_plot
[ BUILD ] process/service_timeseries.rds.ind | get_service_timeseries_d...
[ READ ] | # loading packages
in get_service_timeseries_df function
both years type: list
memory: 102178616
connected to monet in the for loop: fy17
memory: 103856672
raw_timestamps692368015
raw_timestamps: c(1475294399, 1475294400, 1475294401, 1475294402, 1475294407, 1475294409)c("iv", "iv", "iv", "iv", "iv", "iv")c(219807, 4459, 10012, 1172157, 3869909, 133503)
got dates
memory: 11182485488
both years data gathered
memory: 11182567648
disconnected from monet
connected to monet in the for loop: fy18
memory: 11182701144
raw_timestamps2123498258
raw_timestamps: c(1508889601, 1508889606, 1508889606, 1508889607, 1508889607, 1508889607)c("iv", "iv", "iv", "iv", "iv", "iv")c(4682, 1266, 3292, 1286, 10218, 1285)
got dates
memory: 34081914048
both years data gathered
memory: 34082015464
disconnected from monet
saved to rds file
put in s3
[ BUILD ] process/service_timeseries.rds | s3_get("process/service_...
[ BUILD ] service_timeseries_plot | service_timeseries_plot ...
[ BUILD ] service_timeseries_plot_bytes | service_timeseries_plot_...
[ BUILD ] service_timeseries_plot_bytes_reqs | service_timeseries_plot_...
[ BUILD ] process/file_format_timeseries.rds.ind | get_file_format_daily_pe...
Restoring previous version of process/file_format_timeseries.rds.ind
Error in if (res@env$delivered >= res@env$info$rows) { :
missing value where TRUE/FALSE needed
Calls: scmake ... tryCatchList -> dbFetch -> dbFetch -> .handleSimpleError -> h
In addition: Warning messages:
1: `new_overscope()` is deprecated as of rlang 0.2.0.
Please use `new_data_mask()` instead.
This warning is displayed once per session.
2: `overscope_eval_next()` is deprecated as of rlang 0.2.0.
Please use `eval_tidy()` with a data mask instead.
This warning is displayed once per session.
3: `overscope_clean()` is deprecated as of rlang 0.2.0.
This warning is displayed once per session.
4: In monetdb_embedded_query(conn@connenv$conn, statement, execute, :
NAs introduced by coercion to integer range
Execution halted
srun: error: UV00000437-P001: task 0: Exited with exit code 1
Ugh it's that same mysterious error again 😞. Since it's happening here too, I bet that will happen every time we try to query the whole DB with no date filter.
sooo, i am not sure where to go with this. should we just table this? it seems not very fixable without more hacks at this point?
We do need to at least be able to run the GWSIP section to get those statistics, which I think we may be able to do on UV.
As far as actually fixing the mystery error here — I'd like to try first adding more data to the database, and if that doesn't work deleting the data you added. If that doesn't work, we may need to consider pivoting away from MonetDB (we have to do this eventually anyways, since the R MonetDBLite package is no longer supported).
I've established that this error:
Restoring previous version of process/file_format_timeseries.rds.ind
Error in if (res@env$delivered >= res@env$info$rows) { :
missing value where TRUE/FALSE needed
happens when trying to pull in a db query that has too many rows for R to store. R is currently limited to 32 bit vectors, vectors (or data frames) can't be longer than about 2^31, which is approximately 2.1 billion. We likely crossed that threshold when Megan added the October 2019 data. So from here, we'll need to either use separate tables for each fiscal year, or be ok with only requesting summarized output from the database.
I'm still getting some errors with Yeti hitting the memory limit without any error from R. I think this is happening within Monet, but not sure yet. This is happening even when filtering to FY20 data, so if there isn't a work-around we will need to go to multiple tables for sure.
@wdwatkins i was under the impression running this is still blocked at this point? should this just go back into the backlog until we can do the work?
I have created separate tables for each fiscal year in Monet, and moved the data between them accordingly. The workflow needs to be adjusted to use the multiple tables.
I created a branch in the repo (gwsip_stats_only) that only builds the page of statistics that OPP needs, since that only needs to pull from the FY20 table. We can continue to use it until the rest of the workflow is moved over to the new table structure.
Documentation in readme: https://code.usgs.gov/water/NWIS_Analytics
@wdwatkins can assist with any issues
If there are glaring holes in the documentation, fill those as well