thomaszwagerman / butterfly

Verification of continually updating timeseries data where we expect new values, but want to ensure previous data remains unchanged.
https://thomaszwagerman.github.io/butterfly/
Other
2 stars 0 forks source link

Create utility function to which supports all functions. #12

Closed thomaszwagerman closed 1 month ago

thomaszwagerman commented 1 month ago

loupe(), catch() and release() all have the exact same pattern at the start of their function:

  # Check input is as expected
  stopifnot("`df_current` must be a data.frame" = is.data.frame(df_current))
  stopifnot("`df_previous` must be a data.frame" = is.data.frame(df_previous))

  # Check if `datetime_variable` is in both `df_current` and `df_previous`
  if (!datetime_variable %in% names(df_current) || !datetime_variable %in% names(df_previous)){
    stop(
      "`datetime_variable` must be present in both `df_current` and `df_previous`"
    )
  }

  # Using semi_join to extract rows with matching datetime_variables
  # (ie previously generated data)
  df_current_without_new_row <- dplyr::semi_join(
    df_current,
    df_previous,
    by = datetime_variable
  )

  # Compare the current data with the previous data, without "new" values
  waldo_object <- waldo::compare(
    df_current_without_new_row,
    df_previous
  )

  # Obtaining the new rows to provide in feedback
  df_current_new_rows <- dplyr::anti_join(
    df_current,
    df_previous,
    by = datetime_variable
  )

The purpose here is input checks, and returning df_current_without_new_row, a waldo_object and df_current_new_rows. This pattern is shared across the function.

Adding this to a utility function would make all three functions more concise, but this might be at the detriment of readability.