Open Rekyt opened 3 months ago
Addressed (2) and (3) above with https://github.com/traitecoevo/APD/commit/b69f40c800be44de1101bf806e6b79de151d9633
(1) is intentional.
@Rekyt Can you check if the changes on this branch look good to you? There are ~3 traits that won't match because we have to change ";" to "," in the names.
Thank you for pointing out the inconsistencies, especially those places where we have an incorrect TRY number-name match.
With the updated APD_traits_input.csv
file, I only the traits you mention because of the substitution of semi-colons by commas and also of three dots being converted to an actual ellipsis character …
, so it should be fine!
Also, I haven't mentioned it elsewhere, but as you may have guessed, I didn't find any issues with trait matched on BIEN. It's simpler of course because it has only 53 traits.
Similarly to #28. Let's look at the correspondence with TRY.
I've performed a similar matching of codes and names in TRY, and found few typos (see the detailed script below).
trait_0030810
has two traits matching onGIFT_close
.Matching script
```r try_traits = readr::read_delim("tde2024422162351.txt", skip = 3, col_select = -6) apd_try_detailed = tibble::as_tibble(read.csv("APD_traits_input.csv")) |> select(identifier:label, starts_with("TRY")) |> rename(trait_id = identifier) |> tidyr::pivot_longer( starts_with("TRY"), names_to = "match_type", values_to = "matched_trait" ) |> filter(matched_trait != "") |> mutate( # Split for traits that have multiple matches on one line split_traits = purrr::map(stringr::str_split(matched_trait, ";"), trimws), # Extract GIFT trait name extracted_trait = purrr::map( split_traits, \(x) stringr::str_extract(x, "^(.*)\\s\\[", group = 1) ), # Extract GIFT trait code extracted_code = purrr::map( split_traits, \(x) stringr::str_extract(x, "\\[TRY:(.+)\\]", group = 1) |> as.numeric() ) ) |> tidyr::unnest(split_traits:extracted_code) apd_try_smaller = apd_try_detailed |> # Match names based on trait code left_join( try_traits |> distinct(TraitID, name_matched_on_code = Trait), by = c(extracted_code = "TraitID") ) |> # Match code based on trait name left_join( try_traits |> distinct(code_matched_on_name = TraitID, Trait), by = c(extracted_trait = "Trait") ) select(trait, extracted_trait, extracted_code, name_matched_on_code, code_matched_on_name) ## Potentially problematic traits # non-matching names according to code apd_try_smaller |> filter(extracted_trait != name_matched_on_code) # non-matching code according to name apd_try_smaller |> filter(extracted_code != code_matched_on_name) ```