Closed mpartio closed 3 years ago
I cannot reproduce the problem. This works for me:
read_forecast(
2021032200,
2021032200,
"MEPS_prod",
NULL,
lead_time = 0,
members = 0,
file_path = file.path(Sys.getenv("HOME"), "data/harp/vfld"),
file_template = "vfld_eps",
output_file_opts = sqlite_opts(path = tempdir())
)
(Note for file_path I'm just getting my $HOME/data/harp/vfld directory and the "vfld_eps"
template is shorthand for "{file_path}/{fcst_model}/vfld{fcst_model}mbr{MBR3}{YYYY}{MM}{DD}{HH}{LDT2}"
Do you get the same problem if you remove the sqlite file and start again? It could be that repeated attempts to get it working messed something up in the file.
To double check it worked I tested:
read_point_forecast(
2021032200,
2021032200,
"MEPS_prod",
"EPS",
"T2m",
lead_time = 0,
members = 0,
file_path = tempdir()
)
which also worked fine.
Thank you for conforming that it works on your environment. I have a suspicion that this could be something metcoop-related but the problem is driving me nuts.
I rebooted the server in question and now I can also run the example without problems. However the same problem appears systemically If I try to extract simultaneously from 3 members, whereas extracting from 1 or 2 members works. Then trying the same exact script on another server works just fine (with same source files).
I have to then dig deeper into the environment and see if something there is somehow restricting the write.
You could check the data by putting a breakpoint in the code:
In write_fctabe_to_sqlite.R, just before line 90, you could add:
browser()
And rebuild the package with devtools::load_all()
Then when you run (interactively) you will enter the debugger and you can check if there are in fact duplicates in the data - which is what the error message implies.
library(dplyr)
data %>% group_by(!!!syms(primary_key)) %>% summarise(count = n()) %>% filter(count > 1)
This will give you any duplicated rows.
Q
exits debug mode.
You could also try changing index_constraint
to "none"
in the call to db_clean_and_write()
and see what happens.
Thanks for the debugging tips. I will try those.
I installed Harp to a "vanilla" centos7 container and I still get the same error :thinking:
Alright, after some debugging I think I may have found the problem, and it's not in the code nor environment.
I traced the problem down to this SQL clause in sqlite_utils.R
:
SQL_add_index <- paste(
"CREATE",
index_constraint,
"INDEX IF NOT EXISTS",
index_name,
"ON",
index_spec
)
Which translates to:
CREATE unique INDEX IF NOT EXISTS index_fcdate_leadtime_SID ON FC(fcdate,leadtime,SID)
So it seems there are duplicate entries in the sqlite file because the command fails. I commented out the index creation and took a look of the sqlite file that had no index constraint:
sqlite> .schema fc
CREATE TABLE FC(fcdate INT, leadtime INT, parameter TEXT, SID INT, lat DOUBLE, lon DOUBLE, model_elevation DOUBLE, units TEXT, validdate INT, MEPS_prod_mbr000 DOUBLE, MEPS_prod_mbr001 DOUBLE, MEPS_prod_mbr002 DOUBLE);
sqlite> select fcdate, leadtime, sid, count(*) from fc group by 1,2,3 having count(*) > 1 limit 1;
1616544000|0|1045|2
sqlite> select * from fc where fcdate=1616544000 and leadtime=0 and sid=1045;
1616544000|0|CCtot|1045|69.834|21.884|61.8|oktas|1616544000|8.0|8.0|
1616544000|0|CCtot|1045|69.834|21.884|61.7|oktas|1616544000|||8.0
So why does station 1045 have two entries, one row for member 0&1 and another row for member 2?
Tracing the R code at write_fctable_to_sqlite.R
I can see that the data table is "pivoted" so that the values of column "member" become columns themselves, this is done with tidyr::spread
. Adding a print statement before and after:
data %>% dplyr::filter(SID == 1045) %>% print()
data <- data %>%
tidyr::spread(.data$member, .data$forecast)
data %>% dplyr::filter(SID == 1045) %>% print()
Output:
# A tibble: 3 x 11
fcdate leadtime parameter member SID lat lon model_elevation forecast units validdate
<int> <int> <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <chr> <int>
1 1616544000 0 CCtot MEPS_prod_mbr000 1045 69.8 21.9 61.8 8 oktas 1616544000
2 1616544000 0 CCtot MEPS_prod_mbr001 1045 69.8 21.9 61.8 8 oktas 1616544000
3 1616544000 0 CCtot MEPS_prod_mbr002 1045 69.8 21.9 61.7 8 oktas 1616544000
< PIVOT HAPPENS HERE >
# A tibble: 2 x 12
fcdate leadtime parameter SID lat lon model_elevation units validdate MEPS_prod_mbr000 MEPS_prod_mbr001 MEPS_prod_mbr002
<int> <int> <chr> <int> <dbl> <dbl> <dbl> <chr> <int> <dbl> <dbl> <dbl>
1 1616544000 0 CCtot 1045 69.8 21.9 61.8 oktas 1616544000 8 8 NA
2 1616544000 0 CCtot 1045 69.8 21.9 61.7 oktas 1616544000 NA NA 8
So it's just confirming what we see in sqlite file.
I noticed that model_elevation
column has 61.8
for mbr 0&1 and 61.7
for mbr2. The difference is in the vfld-files:
$ grep "01045 " vfldMEPS_prodmbr00*202103240000 | awk '{print $1" "$2" "$3" "$4" "$5}'
vfldMEPS_prodmbr000202103240000: 01045 69.834 21.884 61.8
vfldMEPS_prodmbr001202103240000: 01045 69.834 21.884 61.8
vfldMEPS_prodmbr002202103240000: 01045 69.834 21.884 61.7
Andrew: Could this tiny difference in model_elevation (61.8
vs 61.7
) result in that spread()
doesn't think that the rows are equivalent and doesn't merge the rows correctly?
That's exactly what will be causing the problem. It's expected that different members will have a different elevation for multi model ensembles, so we have a remove_model_elev
switch in sqlite_opts()
to not include the model elevation in the SQLite output - in general it's probably not needed anyway, but by default it's included - because it was requested.
I should probably add a more explicit check on model elevation so that it is clear when this causes a break of the unique constraint.
How this ended up happening with MEPS data in vfld files is another question... are these single precision members perhaps?
These are all double precision members. Member 2 is compiled&run in a different HPC, that could mean something here. Looks to me the change was introduced with cycle 43:
vfld-files for 23.3 00 cycle (cy40):
$ pwd
/nobackup/opdata/verif/ecfadm_vfldMEPS_prod/2021/03
$ grep "01045 " 23/vfldMEPS_prodmbr00{0,1,2}202103230000 | awk '{print $1" "$2" "$3" "$4" "$5}'
23/vfldMEPS_prodmbr000202103230000: 01045 69.834 21.884 63.0
23/vfldMEPS_prodmbr001202103230000: 01045 69.834 21.884 63.0
23/vfldMEPS_prodmbr002202103230000: 01045 69.834 21.884 63.0
vfld-files for 24.3 00 cycle (cy43):
$ grep "01045 " 24/vfldMEPS_prodmbr00{0,1,2}202103240000 | awk '{print $1" "$2" "$3" "$4" "$5}'
24/vfldMEPS_prodmbr000202103240000: 01045 69.834 21.884 61.8
24/vfldMEPS_prodmbr001202103240000: 01045 69.834 21.884 61.8
24/vfldMEPS_prodmbr002202103240000: 01045 69.834 21.884 61.7
@uandrae: any idea why voima-member has different value for model elevation compared to cirrus members?
No, but will investigate.
So to prevent the problem occurring set
output_file_opts = sqlite_opts(path = "/patht/to/FCTABLE", remove_model_elev = TRUE)
This means that model_elevation won't be a column in the SQLite files, in general this shouldn't be a problem as T2m height corrections have already been done.
I've opened #76 to more explicitly describe the issue and am closing this one as the rest is outside harp's scope.
The reason for the different is if the input is in GRIB or FA. Haven't nailed all the details yet but will create a ticket within MetCoOp to start with.
Using remove_model_elev = TRUE
works nicely, thanks!
For some very strange reason read_forecast() is unable to write fctable sqlite files, but fails with "UNIQUE constraint failed".
A stripped example with just one MEPS member, original task extracts data for all members. Latest Harp code from github is used.
vfldMEPS_prodmbr000202103220000.zip