vaulot / Paper-2021-Vaulot-metapr2

0 stars 0 forks source link

Functions list and code to propagate? #1

Open claireerobertson opened 2 years ago

claireerobertson commented 2 years ago

Hi there, Thank you for this fascinating, insightful and incredibly useful piece of work.

I'm a PhD student and I'm currently analysing the data from 18S sequencing of ~20 freshwater ponds (<5ha) in the UK across 5 seasonal timepoints, using the NSF573 and EKNSR951 primer sets (Mangot et al 2013)

I have over 14,000 ASVs in this dataset before clustering. I'm observing similar taxonomic and phylogenetic patterns as the freshwater lakes in your dataset but with some differences.

I would like to assign ecological functions, as you have done, to my ASVs. However, I can't find the code you mention in the paper for assigning these functions based on TableS1 and propagating them to lower levels. I can probably figure it out, but in the interests of time can you share this code?

Secondly, how do I contribute by data to the metaPR2 database?

Many thanks

vaulot commented 2 years ago

Hi

Thank you for your kind words.

The merging of the traits with taxonomy is done at the PR2 database level using a taxonomy table and a trait table that I am attaching below (they are read with my own function, but basically you can read from the XLXS files). The function is down.

You then join this table with your assignment using the "species" field.

Concerning adding data to metaPR2, I am using only published datasets which have been deposited to GenBank. If you spot interesting datasets that are wide enough (i.e. typically containing 100 samples or more) let me know of the paper and I will try to incorporate them in the next version of metaPR2.

Hoping it helps. Cheers. Daniel pr2_taxonomy.xlsx

pr2_traits_merge <- function() {

    # Function to merge taxonomy and traits at a taxo single level  -----------------------------------------------

    pr2_traits_merge_one <- function(pr2_taxonomy, pr2_traits, taxon_level) {
      pr2_traits_one_level <- filter(pr2_traits, taxon_level == taxon_level)

      if (taxon_level == "species") return(inner_join(pr2_taxonomy, pr2_traits_one_level, by = c("species" = "taxon_name")))
      if (taxon_level == "genus") return(inner_join(pr2_taxonomy, pr2_traits_one_level, by = c("genus" = "taxon_name")))
      if (taxon_level == "family") return(inner_join(pr2_taxonomy, pr2_traits_one_level, by = c("family" = "taxon_name")))
      if (taxon_level == "order") return(inner_join(pr2_taxonomy, pr2_traits_one_level, by = c("order" = "taxon_name")))
      if (taxon_level == "class") return(inner_join(pr2_taxonomy, pr2_traits_one_level, by = c("class" = "taxon_name")))
      if (taxon_level == "division") return(inner_join(pr2_taxonomy, pr2_traits_one_level, by = c("division" = "taxon_name")))
      if (taxon_level == "supergroup") return(inner_join(pr2_taxonomy, pr2_traits_one_level, by = c("supergroup" = "taxon_name")))

    }

  taxon_levels <- c("species", "genus", "family", "order", "class" , "division", "supergroup")

  # The next block read the taxonomy table from the PR2 database

  pr2_taxonomy <- dvutils::pr2_taxo_read() %>%
    select(contains("_8")) %>%
    rename_with(~ stringr::str_replace(., '_8', ""), everything()) %>%
    filter(kingdom == "Eukaryota") %>%
    arrange(across(any_of(rev(taxon_levels))))

  # The next block read the trait table from the PR2 database

  pr2_traits <- dvutils::pr2_traits_read()

  trait_types <- pr2_traits %>%
    pull(trait_type) %>%
    unique()

  pr2_merged <- list()

  for (one_trait_type in trait_types) {

    pr2_merged_list <- list()

    pr2_taxonomy_filtered <- pr2_taxonomy

    pr2_traits_filtered <- pr2_traits %>%
    filter(trait_type == one_trait_type) %>%
    select(taxon_name, taxon_level, trait_value)

    # Check if any taxon repeeated

    taxa_repeated <- pr2_traits_filtered %>%
      count(taxon_name) %>%
      filter(n > 1)

    cat(glue::glue("Taxa repeated {nrow(taxa_repeated)} done \n\n"))
    print(taxa_repeated)

    # This is from manually input allocation
    for (taxon_level in  taxon_levels) {

    cat(glue::glue("Level {taxon_level} \n"))

      # Assign trophic mode for PR2 taxa for a given level
      pr2_merged_list[[taxon_level]] <- pr2_traits_merge_one (pr2_taxonomy_filtered, pr2_traits_filtered, taxon_level)

      # Remove from PR2 taxa that have been assigned
      pr2_taxonomy_filtered <- pr2_taxonomy_filtered %>%
        filter(!(species %in% pr2_merged_list[[taxon_level]]$species))

    }

    #  Merge and save ---------------------------------------------------------

    pr2_merged[[one_trait_type]] <- purrr::reduce(pr2_merged_list, bind_rows) %>%
      rename_with (~ one_trait_type, trait_value) %>%
      bind_rows(pr2_taxonomy_filtered) %>%
      arrange(across(any_of(rev(taxon_levels)))) %>%
      select(-taxon_level)
  }

  pr2_taxo_traits <- purrr::reduce(pr2_merged, left_join)
  pr2_taxo_traits_problem <- purrr::reduce(pr2_merged, anti_join)

  species_duplicated <- pr2_taxo_traits %>%
    count(species) %>%
    filter(n > 1)

  print("Species duplicated: ")
  print(species_duplicated)

  # sheets <- list("pr2_taxo merged" = pr2_merged,
  #                "pr2_tax without traits" = pr2_taxonomy)
  #
  # export(sheets, file_pr2_taxo_traits, overwrite = TRUE)

  return(pr2_taxo_traits)

}

pr2_taxonomy.xlsx pr2_traits.xlsx