poissonconsulting / poisix

An R package of Poisson Consulting's date time functions
https://poissonconsulting.github.io/poisix/
Other
0 stars 0 forks source link

ps_create_datetime, ps_create_date? #2

Open sebdalgarno opened 6 years ago

sebdalgarno commented 6 years ago

opposite of ps_separate_date/datetime

Takes a vector of column names to create POSIXct or Date, assign tz, convert tz, automatically remove columns used to create datetime/date something like:

ps_create_datetime <- function(data, col = "DateTime", from = c("Year", "Month", "Day", 
                                                                "Hour", "Min", "Second"), 
                               remove = TRUE, tz = tz_data, tz_convert = tz_analysis) {

  x[col] <- with_tz(lubridate::ymd_hms(paste(paste(x[[from[1]]], x[[from[2]]], x[[from[3]]], sep = '-'),
                                        paste(x[[from[4]]], x[[from[5]]], x[[from[6]]], sep = ":")),
                                  tz = tz),
               tz = tz_convert)

  if(remove) x <- x[, setdiff(colnames(x), from)]

  x
}
joethorley commented 6 years ago

I like my only suggestions are that it should be function(data, year = "Year", month = "Month", day = "Day", hour = "Hour", minute = "Minute", second = "Second", col = "DateTime", remove = TRUE, tz = getOption(poisix.tz, "Etc/GMT+8")) because

1) folks may only want to change the column name of 1 of the columns 2) the conversion can occur afterwards 3) the default time should be a global option

An interesting idea is if for example Second = 0L then the seconds are automatically set to 0 (useful if column missing)

sebdalgarno commented 6 years ago

I wonder if there should be two functions for datetime.

  1. takes 6 columns (year, month, day, hour, minute, second) - ps_ymdhms_datetime
  2. takes 2 columns (date, time) - ps_dt_datetime

Seems to me that these are the two most likely scenarios

sebdalgarno commented 6 years ago

This opens a whole can of worms as datetimes can come in so many formats, but if we can capture the most common...

joethorley commented 6 years ago

An elegant solutions seems to be to just have the one constructor function based on year, month, day, hour, minute, second etc plus a generic deconstruction function that takes a single data/time object and decomposes it into Year, Month, Day or whatever makes sense. The user can then break apart and put back together as they see fit. The only trick thing is what to do about the tz. We could add this to a column as well. Thoughts?

sebdalgarno commented 6 years ago

I like the idea. I also like the construct verb better than create. Could the deconstruction function just deal with character instead of date/time?...that way tz only comes in once the construction happens.

joethorley commented 6 years ago

I think the question of how to go from a character to a date time is different to the question of how to decompose a datetime (which is useful for combining) versus how to construct a date time from year, month, day etc. The character to date time functions in lubridate are well developed so may handle this problem?

sebdalgarno commented 6 years ago

yes the lubridate functions are very good and no need to reinvent the wheel. If there is a character Date and Time column it probably just makes most sense to do lubridate::ymd_hms(paste(Date, Time))

I think constructing from year, month, day, hour, minute, second is still useful though.

perhaps there could be an option in ps_separate_datetime/date to add tz column, which might be useful if planning to deconstruct and reconstruct.

joethorley commented 6 years ago

yes I think the tz argument which could be NULL by default indicating don't create when deconstructing is the best solution. also the deconstruction function should include year = "Year" etc to allow the user to name each of the name columns or in the case of year = NULL to not construct at all.

joethorley commented 6 years ago

also it might be worth having the arguments suffix = "" and prefix = "" which as character scalars allow the user to quickly change all the column from the default of Year, Month, Day, etc to SiteYear, SiteMonth, SiteDay simply by setting suffix = "Site"

sebdalgarno commented 6 years ago

yes great ideas. Do you want me to try to take this one on? The function seems to be almost done.

joethorley commented 6 years ago

yes please

sebdalgarno commented 6 years ago

what is the best way to allow both a column name or an integer?: e.g. second = "Second" or second = 0L This?

if(check_string(second)) second -> data[[second]]
if(check_vector(second, 1L)) second -> second
# repeat for year, month, day, hour, minute and paste
joethorley commented 6 years ago

checkor(check_string(second), check_count(second, coerce = TRUE)) checks that second is a string (character vector of length 1) or a count (non-negative integer of length 1) or a numeric of length 1 that can be coerced to a count

joethorley commented 6 years ago

check_colnames(data, second) checks that data has a column with the same name as the value of second and check_vector(second, length = 1L, values = c(0L,59L,NA)) checks that second is a integer of length 1 with values between 0 and 59 and possibly missing values

sebdalgarno commented 6 years ago

OK that's great, but then what is best way to take a combination of integer and column names and paste into a format that lubridate::ymd_hms can read?

joethorley commented 6 years ago

if second = 0L for example create a new column in the data frame called say ..second (after checking that ..second is not already an existing column) and then set second <- "..second" and carry on as if the user passed second as a column name

joethorley commented 6 years ago

note check_missing_colnames(data, "..second") may be useful