with this i can hit stop at any time and the results will save where i
stopped and go back to row 43 (load_checkpoint), it will pick up where it left off
Optionally, combine the results into a single data frame if needed
powoID_df <- do.call(rbind, powoID)
taxonomy <- vector(length = length(powoID)) # Create empty vector to store orders
for (i in 1:length(powoID)) { # start for loop
taxonomy[i] <- pow_lookup(powoID[i]) # Call pow_lookup for each PoWO ID and store order
}
Check if all sub-lists have the same structure (optional)
if (!all.equal(lengths(extracted_data), sapply(extracted_data, length))) {
warning("Sub-lists in 'output' might have different structures. Extraction might be incomplete.")
}
Append extracted data to the original data frame (assuming rownames match)
Hello -- while using pow_lookup() and get_pow() I noticed that the taxonomy I received in my outputs are incorrect. For example:
My full list of species (3500) all returned "clazz" as "Equisetopsida"
I believe Magnoliidae is also incorrectly designated as subclass for a number of species.
Here is my input data: finalfinal_checklist_2024March03.xlsx
Here is my script:
read in datarame
plant_data <- read.xlsx("~/Desktop/update_flora_final/output/2024March03/finalfinal_checklist_2024March03.xlsx")
Function to save intermediate results
save_checkpoint <- function(obj, filename) { saveRDS(obj, file = filename) }
Function to load intermediate results
load_checkpoint <- function(filename) { if (file.exists(filename)) { return(readRDS(filename)) } else { return(NULL) } } ##########
Load the previous checkpoint if it exists
powoID <- load_checkpoint("powoID_checkpoint.rds")
Initialize powoID if it is NULL (no checkpoint found)
if (is.null(powoID)) { powoID <- list() }
Determine the starting point
start_idx <- length(powoID) + 1
Iterate through the data$family vector
for (i in start_idx:length(plant_data$family)) { family_name <- plant_data$family[i]
Retrieve POW ID for the current family name
powoID[[i]] <- get_pow(sci_com = family_name)
Save the intermediate results after each iteration
save_checkpoint(powoID, "powoID_checkpoint.rds") }
with this i can hit stop at any time and the results will save where i
stopped and go back to row 43 (load_checkpoint), it will pick up where it left off
Optionally, combine the results into a single data frame if needed
powoID_df <- do.call(rbind, powoID)
taxonomy <- vector(length = length(powoID)) # Create empty vector to store orders for (i in 1:length(powoID)) { # start for loop taxonomy[i] <- pow_lookup(powoID[i]) # Call pow_lookup for each PoWO ID and store order }
Extract data from each taxonomy sub-list
extracted_data <- lapply(taxonomy, function(x) { c(family = x$family, order = x$order, class = x$clazz, subclass = x$subclass, phylum = x$phylum, taxonomicStatus = x$taxonomicStatus) })
Check if all sub-lists have the same structure (optional)
if (!all.equal(lengths(extracted_data), sapply(extracted_data, length))) { warning("Sub-lists in 'output' might have different structures. Extraction might be incomplete.") }
Append extracted data to the original data frame (assuming rownames match)
data_output <- cbind(plant_data, do.call(rbind, extracted_data))