ianmcook / tidyquery

Query R data frames with SQL
Apache License 2.0
167 stars 12 forks source link

Add substring_index() function #13

Closed jimmyg3g closed 4 years ago

jimmyg3g commented 4 years ago

Add MySQL substring_index() function, https://www.w3schools.com/sql/func_mysql_substring_index.asp.

Here are tidyverse and base versions of the function:

# tidyverse version
substring_index_tidyverse <- function(string, delimiter, number) {
    string_parts <- str_split(string, coll(delimiter), simplify = FALSE) %>% unlist()
    n_parts <- length(string_parts)
    if(number > 0) {
        number <- min(c(number, n_parts))
        return(string_parts[1:number] %>%
                str_flatten(delimiter))
    }
    if(number < 0) {
        number <- max(c(number, -n_parts))
        return(string_parts[(n_parts + number + 1):n_parts] %>%
            str_flatten(delimiter))
    }
}

substring_index_tidyverse('www.w3schools.com', '.', 2)

# base version
substring_index_base <- function(string, delimiter, number) {
    string_parts <- strsplit(string, delimiter, fixed = TRUE) %>% unlist()
    n_parts <- length(string_parts)
    if(number > 0) {
        number <- min(c(number, n_parts))
        return(string_parts[1:number] %>%
                paste(., collapse = delimiter))
    }
    if(number < 0) {
        number <- max(c(number, -n_parts))
        return(string_parts[(n_parts + number + 1):n_parts] %>%
                paste(., collapse = delimiter))
    }
}

substring_index_base('www.w3schools.com', '.', 2)

I tried adding to queryparser's translations.R file, but I'm running into a "could not find function" error with tidyquery::query(). I'm not sure where changes need to be made in tidyquery and queryparser to make it work.

ianmcook commented 4 years ago

Thanks very much @jimmyg3g! Please pardon the delay—within the next few days I will add this to the queryparser code and link to the relevant commit from here.

ianmcook commented 4 years ago

@jimmyg3g I added a translation for substring_index() to queryparser for the tidyverse = TRUE case. The tidyverse = FALSE case is more complicated and it's not required for tidyquery to work, so I'm not going to leave it at this for now.

Because of the way that queryparser works, I wasn't able to directly use the code you provided above. (In queryparser, translations need to be vectorized functions and they need to return an unevaluated expression that can evaluate within the context of a data frame.) If you're curious to see how it is implemented, here's the code: https://github.com/ianmcook/queryparser/blob/master/R/translations.R#L767-L792