terraref / reference-data

Coordination of Data Products and Standards for TERRA reference data
https://terraref.org
BSD 3-Clause "New" or "Revised" License
9 stars 2 forks source link

Fix incorrect dates in season 4 and 6. #280

Open dlebauer opened 4 years ago

dlebauer commented 4 years ago

Error reported:

I performed a betyDB query to extract all the season 6 records a few weeks ago and just repeated it today. Same results both times. In my post-query processing, I count a total of 925,563 received measurement records. The vast majority of record dates are all in the interval between 4/16/2018 and 8/22/2018. However, 5000+ of these records are from the previous year (a single date: 2017-07-05) . The sitename for these records contain "Season 6", which is why I received them, but they appear to be from a different season. I am attaching an R snippet to isolate the records.

library(traits)
library(ggplot2)
library(lubridate)
library(dplyr)
library(knitr)

# do a query of BetyDB using a key

options(betydb_url = "https://terraref.ncsa.illinois.edu/bety/",
        betydb_api_version = 'beta',
        betydb_key = 'xxxxxxxx')

# get all of season 6data 
season_6 <- betydb_query(sitename  = "~Season 6",
                         limit     =  "none")
season_6_date <- season_6 %>%
  mutate(trans_date = with_tz(ymd_hms(raw_date), "America/Phoenix"))

# extract the possibly bad records
bad_records <- subset(season_6_date, trans_date < "2018-01-01", select = c(cultivar,trans_date))
dlebauer commented 4 years ago

I am not sure why these were incorrectly named, so I've flagged them as being in error and have excluded these from the database

begin;
update traits set checked = -1, notes = 'incorrect plot (sitename) for season 4 data' where id in (select distinct id from traits_and_yields_view_private where sitename like '%Season 6%' and year = 2017);
commit;

@max-zilla these were all created in 2018-05-06; do you know why these were given the wrong date?