oobianom / quickcode

An R package made out of mine and Brice's scrapbook of much needed functions.
https://quickcode.obi.obianom.com
Other
5 stars 0 forks source link

Function to Mutate a subset of dataset, and reattach it to the dataset #27

Closed oobianom closed 1 month ago

oobianom commented 4 months ago

Hi Brice, does such a function already exist?

Basically, with dplyr, I can filter and then do all downstream processes like group_by mutate and so on. But there I need that filtered portion to remain in the entire dataset after the manipulation of that subset.

Let me know if you understand. Else, I can rephrase.

brichard1638 commented 3 months ago

I believe I understand what you are asking but if you could rephrase your inquiry with a sample dataset, it would be more helpful.

oobianom commented 3 months ago

To make it easier for you to assess, i put together a rough draft of the function and updated this repository.

Here is an example, dt = mtcars

I want to subset to "mpg == 21.0 & cyl == 6", then mutate various columns within that subset while leaving the others intact

with base R, this is how I would approach it

dt[dt$mpg == 21.0 & dt$cyl == 6,]$cyl = 1000 dt[dt$mpg == 21.0 & dt$cyl == 6,]$hp = 2000 dt[dt$mpg == 21.0 & dt$cyl == 6,]$vs = dt[dt$mpg == 21.0 & dt$cyl == 6,]$hp*2

with the new function, this how I will do it

mutate_filter(dt,mpg == 21.0 & cyl == 6, cyl=1000,hp=2000,vs=hp*2)

brichard1638 commented 3 months ago

The proposed function you have described does not exist, at least in the way you have described it, in R. Given the additional information you have provided, I have crafted what I believe is a function that meets the requirements you have laid out.

FUNCTION NAME: mutate_filter

TOTAL NUMBER OF FUNCTION ARGUMENTS: 6

ARGUMENT NAMES:

ARGUMENT SUMMARY DESCRIPTION:

OPTIONALITY: Only two arguments are required to execute the function. These arguments are data and f_arg1.

FUNCTION STRUCTURE: mutate_filter <- function(data, f_arg1, f_arg2, mutcolx, mutcoly, expr) { if (missing(f_arg2)) { d1 <- dplyr::filter(data, eval(parse(text = f_arg1))) } else { d1 <- dplyr::filter(data, eval(parse(text = f_arg1)), eval(parse(text = f_arg2))) }

if (missing(mutcolx)) { quote(expr = ) } else { eval(parse(text = paste0("d1$", mutcolx))) }

if (missing(mutcoly)) { quote(expr = ) } else { eval(parse(text = paste0("d1$", mutcoly))) }

if (missing(expr)) { quote(expr = ) } else { # Evaluate the expression within the data frame context calc_fld <- eval(parse(text = expr), envir = d1) # Add the new field to the data frame d1$calc_fld <- calc_fld } return(d1) }

FUNCTION TESTING STATUS: The function has been tested but not extensively. If the function meets the expectations provided by the previous explanation and requirements as outlined in this issue, additional testing should be conducted.

If the function does not work as presented, especially consistent with the examples provided, please reach out and I will send the function syntax again. It is possible that the conversion from the R application to this medium did not capture the code syntax correctly.

FUNCTIONAL UTILITY: It is not understood what the value proposition is for the arguments in the function called mutcolx and mutcoly. Consistent with the requirements provided, they were included. However, mutating an entire data field with a single value does not seem to be useful or provide a high level of utility. Adding a second argument that replicates this functionality is also questionable. Unless a compelling reason exists for the inclusion of these arguments, it is strongly recommended that they be removed from the function. The function would then contain a total of (4) arguments, collectively providing what is believed to be an extraordinary value proposition.

One way to improve the utility of the mutate_filter function would be to replace one of the mutcol arguments with an argument that can control the removal of contiguous or non-contiguous variables from the data frame object in the mutated output.

CODE EXAMPLES: library(DescTools) data("d.pizza")

data("mtcars") data("quakes")

mutate_filter(mtcars, f_arg1 = "mpg == 21.0", f_arg2 = "cyl == 6", mutcolx = "cyl = 1000", mutcoly = "hp = 2000", expr = "hp*2") mutate_filter(d.pizza[,1:10], f_arg1 = "driver == 'Taylor'", f_arg2 = "area == 'Camden'", expr = "count*price") mutate_filter(mtcars, f_arg1 = "cyl == 8", expr = "vs+am+gear+carb") mutate_filter(quakes, f_arg1 = "stations == 10", expr = "round(mag/depth,3)")

oobianom commented 3 months ago

Thanks Brice. Actually, I don't think we need the secondary arguments since one can easily combine such as "mpg == 21 & cyc == <3"

brichard1638 commented 3 months ago

Does the proposed function I provided meet the requirements you laid out?


From: Obi Obianom @.> Sent: Friday, June 21, 2024 7:48 PM To: oobianom/quickcode @.> Cc: brichard1638 @.>; Comment @.> Subject: Re: [oobianom/quickcode] Function to Mutate a subset of dataset, and reattach it to the dataset (Issue #27)

Thanks Brice. Actually, I don't think we need the secondary arguments since one can easily combine such as "mpg == 21 & cyc == <3"

— Reply to this email directly, view it on GitHubhttps://github.com/oobianom/quickcode/issues/27#issuecomment-2183587413, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASLI5UN6PXK5NOM2SVIVWMLZIS3VNAVCNFSM6AAAAABJJHUJEWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBTGU4DONBRGM. You are receiving this because you commented.Message ID: @.***>

brichard1638 commented 3 months ago

Based on your latest feedback, I've re-constructed the mutate_filter function in the following ways:

NEW FUNCTION STRUCTURE: mutate_filter <- function(data, f_arg1, f_arg2, rem = NULL, srtfld = NULL, expr) { if (missing(f_arg2)) { d1 <- dplyr::filter(data, eval(parse(text = f_arg1))) } else { d1 <- dplyr::filter(data, eval(parse(text = f_arg1)), eval(parse(text = f_arg2))) }

if (!is.null(rem)) { d1 <- d1[, -c(rem)] }

if (missing(expr)) { quote(expr = ) } else { # Evaluate the expression within the data frame context calc_fld <- eval(parse(text = expr), envir = d1) # Add the new field to the data frame d1$calc_fld <- calc_fld }

if (!is.null(srtfld)) { d1 <- dplyr::arrange(d1, eval(parse(text = srtfld))) }

return(d1) }

It is believed that this version of the mutate_filter function possesses a much higher value proposition than its predecessor. As a result, this modified function should be the one selected for inclusion in the quickcode package.

FUNCTION TESTING STATUS: The updated function has been tested but not extensively. If the functional output meets the expectations of the requirements previously outlined in this issue, additional testing should be conducted.

ADDITIONAL NOTES:

CODE EXAMPLES: library(DescTools) data("d.pizza") data("mtcars")

mutate_filter(mtcars, "mpg == 21.0", "cyl == 6", expr = "hp*2") mutate_filter(d.pizza[,1:10], f_arg1 = "driver == 'Taylor'", f_arg2 = "area == 'Camden'", expr = "count*price") mutate_filter(mtcars, f_arg1 = "cyl == 8", expr = "vs+am+gear+carb") mutate_filter(airquality, f_arg1 = "Month == 5", rem = c(3:4), expr = "Ozone/Solar.R") mutate_filter(d.pizza, f_arg1 = "area == 'Camden'", rem = c(1:4, 15,16), srtfld = "price", expr = "round(count*price,2)") mutate_filter(mtcars, f_arg1 = "vs == 1", rem = c(2:5, 11), srtfld = "mpg") mutate_filter(mtcars, f_arg1 = "mpg > 20", rem = 11) mutate_filter(d.pizza[5:10], f_arg1 = "area == 'Westminster'", srtfld = c("driver", "price"))

CONCLUSION The only thing missing from the code supporting the mutate_filter function is that each argument must be expressly cited or the function will crash. I'm not sure what changes need to be made to the code but argument names when using the function should be optional.

oobianom commented 3 months ago

Thanks Brice!

brichard1638 commented 3 months ago

When are you planning on publishing the next version of quickcode?


From: Obi Obianom @.> Sent: Saturday, June 22, 2024 9:28 PM To: oobianom/quickcode @.> Cc: brichard1638 @.>; Comment @.> Subject: Re: [oobianom/quickcode] Function to Mutate a subset of dataset, and reattach it to the dataset (Issue #27)

Thanks Brice!

— Reply to this email directly, view it on GitHubhttps://github.com/oobianom/quickcode/issues/27#issuecomment-2184310365, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASLI5UMM64DCODJCPFLN6R3ZIYQERAVCNFSM6AAAAABJJHUJEWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBUGMYTAMZWGU. You are receiving this because you commented.Message ID: @.***>