ropensci / taxize

A taxonomic toolbelt for R
https://docs.ropensci.org/taxize
Other
264 stars 58 forks source link

incorrect taxonomy from "POW" functions #932

Open msedaghatpour opened 1 month ago

msedaghatpour commented 1 month ago

Hello -- while using pow_lookup() and get_pow() I noticed that the taxonomy I received in my outputs are incorrect. For example:

My full list of species (3500) all returned "clazz" as "Equisetopsida" Screen Shot 2024-05-23 at 12 48 03 PM

I believe Magnoliidae is also incorrectly designated as subclass for a number of species.

Here is my input data: finalfinal_checklist_2024March03.xlsx

Here is my script:

read in datarame

plant_data <- read.xlsx("~/Desktop/update_flora_final/output/2024March03/finalfinal_checklist_2024March03.xlsx")

Function to save intermediate results

save_checkpoint <- function(obj, filename) { saveRDS(obj, file = filename) }

Function to load intermediate results

load_checkpoint <- function(filename) { if (file.exists(filename)) { return(readRDS(filename)) } else { return(NULL) } } ##########

Load the previous checkpoint if it exists

powoID <- load_checkpoint("powoID_checkpoint.rds")

Initialize powoID if it is NULL (no checkpoint found)

if (is.null(powoID)) { powoID <- list() }

Determine the starting point

start_idx <- length(powoID) + 1

Iterate through the data$family vector

for (i in start_idx:length(plant_data$family)) { family_name <- plant_data$family[i]

Retrieve POW ID for the current family name

powoID[[i]] <- get_pow(sci_com = family_name)

Save the intermediate results after each iteration

save_checkpoint(powoID, "powoID_checkpoint.rds") }

with this i can hit stop at any time and the results will save where i

stopped and go back to row 43 (load_checkpoint), it will pick up where it left off

Optionally, combine the results into a single data frame if needed

powoID_df <- do.call(rbind, powoID)

taxonomy <- vector(length = length(powoID)) # Create empty vector to store orders for (i in 1:length(powoID)) { # start for loop taxonomy[i] <- pow_lookup(powoID[i]) # Call pow_lookup for each PoWO ID and store order }

Extract data from each taxonomy sub-list

extracted_data <- lapply(taxonomy, function(x) { c(family = x$family, order = x$order, class = x$clazz, subclass = x$subclass, phylum = x$phylum, taxonomicStatus = x$taxonomicStatus) })

Check if all sub-lists have the same structure (optional)

if (!all.equal(lengths(extracted_data), sapply(extracted_data, length))) { warning("Sub-lists in 'output' might have different structures. Extraction might be incomplete.") }

Append extracted data to the original data frame (assuming rownames match)

data_output <- cbind(plant_data, do.call(rbind, extracted_data))