Closed fontikar closed 7 months ago
@wcornwell identified that most time is being spent on extract genus. The code is
extract_genus <- function(taxon_name) {
genus <-
ifelse(
stringr::word(taxon_name, 1) %>% stringr::str_to_lower() == "x",
paste(stringr::word(taxon_name, 1) %>% stringr::str_to_lower(), stringr::word(taxon_name, 2) %>% stringr::str_to_sentence()),
stringr::word(taxon_name, 1) %>% stringr::str_to_sentence()
)
genus
}
Current APC has 110000 names. The proposed revision below cuts the run time from 4.4 s down to 0.13s. So, the time to run dev tools::load_all
drops from 23.1s to 6.9s. There'll be further efficiencies that are possible.
extract_genus <- function(taxon_name) {
genus <- str_split_i(taxon_name, " ", 1)
word2 <- str_split_i(taxon_name, " ", 2)
genus <- word1
# Deal with names that being with x,
# e.g."x Taurodium x toveyanum" or "x Glossadenia tutelata"
i <- stringr::str_to_lower(genus) == "x"
genus[i] <- paste("x", word2[i])
genus %>% stringr::str_to_sentence()
}
@wcornwell - Can you paste example code for profiling? I'm just running
system.time({devtools::load_all()})
I've used profvis before, but don't have code handy
same syntax basically, but with better output.
profvis::profvis({
resources <- load_taxonomic_resources()
})
takes a little while to understand the interactive browser thing
seems done?
extract_genus
needs to speed up.Relevant SO post: https://stackoverflow.com/questions/70945318/r-large-data-table-why-is-extracting-a-word-with-regex-faster-than-stringrword