Closed scantle closed 2 years ago
Should be easy to
1) reexport readr::problems()
so that you can review that output without loading readr
2) write the results to a file on error so that you can manually review the issue
Your specific problem is actually an issue with how whoever maintains the SGN station is writing output. Lines 6230:6240 are below, note that problems()
signaled an issue with line 6237:
SGN,E,20,FLOW,20211009 0315,20211009 0315,2, ,CFS
SGN,E,20,FLOW,20211009 0330,20211009 0330,2, ,CFS
SGN,E,20,FLOW,20211009 0345,20211009 0345,BRT, ,CFS # <-- problem
SGN,E,20,FLOW,20211009 0400,20211009 0400,2, ,CFS
SGN,E,20,FLOW,20211009 0415,20211009 0415,2, ,CFS
SGN,E,20,FLOW,20211009 0430,20211009 0430,2, ,CFS
The other line is the same issue. Lines 31685:31690 are below, note that "---" is recognized by the parser as missing data and is replaced with NA
:
SGN,E,20,FLOW,20220702 0530,20220702 0530,6, ,CFS
SGN,E,20,FLOW,20220702 0545,20220702 0545,---, ,CFS
SGN,E,20,FLOW,20220702 0600,20220702 0600,BRT, ,CFS # <-- problem
SGN,E,20,FLOW,20220702 0615,20220702 0615,6, ,CFS
SGN,E,20,FLOW,20220702 0630,20220702 0630,6, ,CFS
SGN,E,20,FLOW,20220702 0645,20220702 0645,6, ,CFS
As you can see, they wrote "BRT" to the numeric column, which obviously will cause issues for the parser (and dataframes in general). If you look at the query page you'll see their note at the bottom:
BRT and ART signify discharge at stage below or above available rating table
They should be putting those flags in the "data flag" field instead of the "value" field (at least, that's where I would put it). I don't think cder
should try to handle this problem directly or try to guess what the data producer "meant" since these codes are station-dependent. I will look into readr::read_csv()
a bit more and see if I can capture those bad lines in some way rather than dropping them altogether (which is what currently happens).
I also recommend reaching out to the SGN data managers and letting them know of the issue. Some data managers on CDEC are very responsive others... less so.
FYI, I followed up with CDEC and they confirmed the various flags are intended to appear in the value field. From their FAQ:
Some of the data is missing, and in the place of a numerical value there is either "ART","BRT", or "--". What do they stand for? "ART" stands for Above Rating Table, "BRT" stands for Below Rating Table, and "--" stands for missing value...
Which, again, implies the value column is to be used for flags. I guess the DATA_FLAG column in ornamental? I agree it would be much easier (for those of us on this end) if the value column was purely numeric.
Your latest changes, however, have made it so my code works as intended. Thanks!
Very useful package. First issue encountered today:
cdec_query()
gave me a warning about a parsing issue:I had some trouble following up on the warning, too:
Loading
readr
prior to callingcdec_query
, allowed me to see the "problem":However, that doesn't help me know what data I'm missing
readr
issues a warningThanks!