oobianom / quickcode

An R package made out of mine and Brice's scrapbook of much needed functions.
https://quickcode.obi.obianom.com
Other
5 stars 0 forks source link

Parse a Date into Three Separate Variables #18

Closed brichard1638 closed 7 months ago

brichard1638 commented 7 months ago

In the Ecfun R package there exists a function called Date3to1 which conjoins separately defined variables consisting of a year, month, and day into a single date variable.

However, there is no inverse functionality supporting this function. That is to say, there is no R function that parses a variable of class Date into three separately defined variables consisting of year, month, and day.

The function below is meant to solve this problem. The proposed name of the function is Date1to3: Date1to3 = function (data) { if (class(data) != "Date") { stop("class(data) is not an object of class Date") } str = as.character(data) yr1 = easyr::left(str, 4) mth1 = easyr::mid(str, 6, 2) day1 = easyr::right(str, 2) x = data.frame(yr1, mth1, day1) x }

oobianom commented 7 months ago

I think this is a great idea to include. I will work on it. But keep in mind that this will mean that if a person wants the ability to do both functionality, the person would have to call both the quickcode and Ecfun packages just to be able to use the functions.

While I agree with your idea, I would suggest that we make a more comprehensive function that will do both the functionality of Date1to3 and Date3to1

What do you think? If you agree, what will be an alternative function name you suggest for us to use?

brichard1638 commented 7 months ago

Okay - Absolutely combining the functions into a single function certainly adds value to the initial idea I had offered! However, I am inclined to believe that the functionality is so radically different between these functions that separating them might be a more appropriate approach. For example, in a 3to1 approach, you have to pass three separate variables that must be defined before they can be combined into a single variable, and, these variables must be specifically defined as to their class. In the 1to3 approach, only one variable is required to be passed. If you can integrated these separation of interests into a single function, I think it would be great. However, the arguments must be clearly stated and easy to configure.

Once you start examining this idea, if you find that the separation of interests are too excessive, I would just create two separate functions making the functionality ultimately complete without the user having to create a separate dependency if both functions were needed. I think it's also a safe assumption that the user will not KNOW about the alternative function residing in the Ecfun package! That is quite an assumption to make.

Proposed Function Name: ParseDate1to3 Function Structure: ParseDate1to3(var) Argument Type: var = a variable either separately defined or referenced from a variable in a dataset (example: ds[,6]) Output: The output returns a data frame of three variables consisting of yr1, mth1, and day1, each of which is of class numeric

Proposed Function Name: ParseDate3to1 Function Structure: ParseDate3to1(ds, cls) Argument Type: ds = a data frame object that makes a reference to three variables specifically defined as year, month, and day; can be passed as a separate data frame containing only three variables with the required names or can be syntactically expressed as ds[,1:3], or non-contiguously as ds[,c(5,8,12)]; the order of the variables should not matter and the algorithm should be able to appropriately discern the difference between a year, a month, and a day; however, the variables must be correctly labeled for the algorithm to work as defined. The cls argument passes either a string response as either "str" or "dt" to indicate the object class of the variable output. Output: The output returns a vector of dates of the class defined by the cls argument

oobianom commented 7 months ago

So I have thought more about this, and I keep coming to the same conclusion. I agree with the idea of creating to separate functions.

However, I am leaning towards "date1to3" and "date3to1" for the names. They are shorter, and like you rightly pointed out "...it's also a safe assumption that the user will not KNOW about the alternative function residing in the Ecfun package". So its okay if we keep the same name. And if a user does insert both packages, they will be use either function to achieve the same result anyway.

Let me know if you agree.

brichard1638 commented 7 months ago

Agreed!

I have no additional comments beyond what you suggested.


From: Obi Obianom @.> Sent: Saturday, March 16, 2024 7:04 PM To: oobianom/quickcode @.> Cc: brichard1638 @.>; Author @.> Subject: Re: [oobianom/quickcode] Parse a Date into Three Separate Variables (Issue #18)

So I have thought more about this, and I keep coming to the same conclusion. I agree with the idea of creating to separate functions.

However, I am leaning towards "date1to3" and "date3to1" for the names. They are shorter, and like you rightly pointed out "...it's also a safe assumption that the user will not KNOW about the alternative function residing in the Ecfun package". So its okay if we keep the same name. And if a user does insert both packages, they will be use either function to achieve the same result anyway.

Let me know if you agree.

— Reply to this email directly, view it on GitHubhttps://github.com/oobianom/quickcode/issues/18#issuecomment-2002173233, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASLI5UMMLFHI35PUKBBXNH3YYTFYHAVCNFSM6AAAAABEJ5I4VSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBSGE3TGMRTGM. You are receiving this because you authored the thread.Message ID: @.***>

oobianom commented 7 months ago

Okay, great! I will working on revising the function. I had already started last week,

oobianom commented 7 months ago

Hi Brice, when you have time, can you check and comment on https://github.com/oobianom/quickcode/issues/19

oobianom commented 7 months ago

Hi Brice, I have finished the development draft for the two proposed functions: date3to1 and date1to3

I improved the functionality such that the user can pass in various date formats and also output or combine to various formats. Take a look when you have time.

brichard1638 commented 7 months ago

Preliminary testing revealed that the date1to3 function failed on every test conducted on it: Using the sample dataset provided in the documentation called data1, an example is provided below: date1to3(data1) Error in date1to3(data1) : The columns for Year Month Day (col.YMD) does not exist in the dataset


Preliminary testing revealed that the date3to1 function failed on every test conducted on it: Using the sample dataset provided in the documentation called data0, three test example results are provided below:

date3to1(data0) Error in date3to1(data0) : The columns for Year Month Day (col.YMD) does not exist in the dataset

date3to1(data0, as.vector = TRUE) Error in date3to1(data0, as.vector = TRUE) : The columns for Year Month Day (col.YMD) does not exist in the dataset

date3to1(data0, out.format = "%d_%m_%Y") Error in date3to1(data0, out.format = "%d_%m_%Y") : The columns for Year Month Day (col.YMD) does not exist in the dataset

oobianom commented 7 months ago

Hi Brice, so sorry about that. I just made an update. Please reinstall from the newest repo update and try now. Thanks.

brichard1638 commented 7 months ago

The following remarks represent preliminary testing results initiated against the date functions date1to3 and date3to1 in the latest version of the quickcode R package. Additional testing will be conducted once the following concerns have been mitigated:

Testing Results in re: date1to3 Preliminary testing revealed one issue and one recommendation unique to this function.

Issue: The order of the variables returned in the out.cols argument are not sequentially correct. For example: out.cols = c("b", "d") returns the day and the month as a three-digit variable output when the correct order is month and day

Recommendation: This is a recommendation based on amending the current documentation supporting the date1to3 function. While the Date Formats provided in the documentation are very helpful in terms of facilitating how to encode the out.cols argument, the confusion lies in how these encodings will be interpreted by the user. I was initially confused by it so the user will most likely make the same assumption I did. That is to say, the date specification for the out.cols argument does NOT require a % symbol to be applied. This was not made clear to me. When applying the % symbol, results returned string literals of the out.cols configurations. Notifying the user that the % symbol is NOT required for this argument is critical to providing the guidance needed to successfully apply this function.


Testing Results in re: date3to1 Preliminary testing revealed one question unique to this function.

Given the following dataset configuration as provided below: head(x) Phase Cause Fatalities y d m 1 landing criminal 27 1993 21 Sep 2 landing criminal 108 1993 22 Sep 3 landing criminal 125 1996 23 Nov 4 landing criminal 112 2002 07 May 5 landing unknown 41 1993 01 Jul 6 landing unknown 19 1993 31 Jul

What is the correct function configuration using the date3to1 function? Intuitively, the configuration is provided below which returns an unexpected response relative to the output.date variable:

head(date3to1(x, out.format = "%Y-%m-%d", col.YMD = c(4,6,5), as.vector = FALSE)) Phase Cause Fatalities y d m output.date 1 landing criminal 27 1993 21 Sep NA 2 landing criminal 108 1993 22 Sep NA 3 landing criminal 125 1996 23 Nov NA 4 landing criminal 112 2002 07 May NA 5 landing unknown 41 1993 01 Jul NA 6 landing unknown 19 1993 31 Jul NA

It is not known if the function failed on the merits, or, alternatively, if the function was incorrectly encoded which would be consistent with a user error. A determination needs to be made as to why this output is incorrectly returned. Recommendations are provided based on how the error is defined:

Option 1: If the error is a failure within the function itself, it should be modified to allow for configuring various combinations of year, month, and day in any order, and allowing the month to be configured as either numeric or as a month literal (ex: Jul, July).

Option2: If the function was incorrectly encoded, then additional documentation should be provided that establishes a similar example where the date specification captures the essence of the example provided or, alternatively, similarly defined examples.

oobianom commented 7 months ago

Thanks. I have taken care of the date1to3. The order is not correct. I have made so that "%" is used eg. out.cols = c("%Y", "%m", "%d")

brichard1638 commented 7 months ago

The tests conducted for both the date1to3 and date3to1 functions all passed. Testing consisted of passing various argument configurations to each function. In each of these tests, the output was correctly returned as expected.

The only thing I would add is a comment in the documentation for the date3to1 function which states that all three columns must be numerically valued before executing the function. You cannot, for example, provide the month variable as non-numeric such as "Mar", "Jul", or "Jan" or else the results will return an "NA" in the output.date field.

oobianom commented 7 months ago

Great, I have included that in the function documentation. This is be all set as well. Thanks Brice.

brichard1638 commented 7 months ago

👍

Brice


From: Obi Obianom @.> Sent: Friday, March 29, 2024 12:32 AM To: oobianom/quickcode @.> Cc: brichard1638 @.>; Author @.> Subject: Re: [oobianom/quickcode] Parse a Date into Three Separate Variables (Issue #18)

Great, I have included that in the function documentation. This is be all set as well. Thanks Brice.

— Reply to this email directly, view it on GitHubhttps://github.com/oobianom/quickcode/issues/18#issuecomment-2026629985, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASLI5UNKNWOBT7P63HFCMBTY2TVGDAVCNFSM6AAAAABEJ5I4VSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRWGYZDSOJYGU. You are receiving this because you authored the thread.Message ID: @.***>