skgrange / threadr

Tools to Thread Pieces Of Data Analysis Together
GNU General Public License v3.0
31 stars 7 forks source link

[Bug]: implicit assumption of data structure when detecting date interval #12

Open mooibroekd opened 2 weeks ago

mooibroekd commented 2 weeks ago

The current method applied in detect_date_interval implicitly assumes a data structure ordered by variable first, and within the variable by date.

The function fails to determine the interval when the data is ordered first by date, and within this date by variable (as illustrated below):

# A tibble: 3,313,296 × 3
   date                variable     value
   <dttm>              <chr>        <dbl>
 1 2015-01-01 00:00:00 wd           200  
 2 2015-01-01 00:00:00 ws             4  
 3 2015-01-01 00:00:00 t              1.1
 4 2015-01-01 00:00:00 q              0  
 5 2015-01-01 00:00:00 hourly_rain    0  
 6 2015-01-01 00:00:00 p           1034. 
 7 2015-01-01 00:00:00 rh            96  
 8 2015-01-01 01:00:00 wd           200  
 9 2015-01-01 01:00:00 ws             4  
10 2015-01-01 01:00:00 t              0.8
# ℹ 3,313,286 more rows
# ℹ Use `print(n = ...)` to see more rows

Note that in this case the dplyr::lag function will produce a lot of 0's as there is no lag between the first 7 rows. That also means that the threadr::mode_average function will report 0 as the outcome, ultimately leading to an unknown interval being (falsely) reported.

The easiest way to fix this is by probably removing all 0 values for the seconds. If the number of rows in seconds after removal is also zero, then the user provided data without any date interval.

Alternatively, when selecting the dates this can be done on the unique set of dates in the dataframe, possibly ordering them after selecting them to prevent further issues.

skgrange commented 2 weeks ago

Hello Dennis, I am happy to have a look at this, but it will have to wait for a couple of weeks because I am travelling. I would recommend developing a work around for now because of the delay. Enjoy and I will come back to this when I can.