grunwaldlab / metacoder

Parsing, Manipulation, and Visualization of Metabarcoding/Taxonomic data
http://grunwaldlab.github.io/metacoder_documentation
Other
135 stars 28 forks source link

New error after update to 0.3.3: Error: All pairs being compared should have one value per taxon. The following do not #270

Closed SimenHyllHansen closed 5 years ago

SimenHyllHansen commented 5 years ago

Hi! I have several copies of a metacoder script I have used in several projects. I used a few of these a couple of months ago and they worked perfectly, but now (after updating R and the metacoder package after summer) with unchanged input files and unchanged scripts they throw an error that stops plotting. If I change n_supertaxa < 6 there is no error (the input table for the script is pre-collapsed to genus level).

 families = filter_taxa(obj, n_supertaxa < 5)
 families %>%
   heat_tree_matrix(data = "diff_table",
                    node_label = taxon_names,
                    node_color = log2_median_ratio, # difference between groups
                   node_color_trans = "linear",
                   node_color_interval = c(-3, 3), # symmetric interval
                    edge_color_interval = c(-3, 3), # symmetric interval
                    node_color_range = diverging_palette(), # diverging colors
                 node_size_axis_label = 1,
                  node_color_axis_label = "Log 2 ratio of median counts",
                    layout = "da", #initial_layout = "re",
                    key_size = 0.01,
                    seed = 616)
Error: All pairs being compared should have one value per taxon. The following do not:
   HIV_MSM vs. HIV_non-MSM (378), HIV_MSM vs. MSM (378), HIV_MSM vs. Healthy (378), HIV_non-MSM vs. MSM (378), HIV_non-MSM vs. Healthy (378), MSM vs. Healthy (378)

This also happens if done in this fashion, and when done this way BOTH genus and family level throws the same error:

obs %>%
taxa::filter_taxa(taxon_ranks == "g", supertaxa = TRUE) %>%
heat_tree_matrix(data = "diff_table",
                 node_label = taxon_names,
                 node_color = log2_median_ratio, # difference between groups
                 node_color_trans = "linear",
                 node_color_interval = c(-3, 3), # symmetric interval
                 edge_color_interval = c(-3, 3), # symmetric interval
                 node_color_range = diverging_palette(), # diverging colors
                 node_size_axis_label = 1,
                 node_color_axis_label = "Log 2 ratio of median counts",
                 layout = "da", initial_layout = "re",
                 key_size = 0.01,
                 seed = 616)
Adding a new "character" vector of length 352.
Error: All pairs being compared should have one value per taxon. The following do not:
   HIV_MSM vs. HIV_non-MSM (378), HIV_MSM vs. MSM (378), HIV_MSM vs. Healthy (378), HIV_non-MSM vs. MSM (378), HIV_non-MSM vs. Healthy (378), MSM vs. Healthy (378)

I have spent more than a whole workday on this, and hopefully it is just me missing something obvious (I am not very experienced in R). So far I have concluded that the update is at fault, has anyone encountered similar issues after using the newest update(metacoder 0.3.3) ?

zachary-foster commented 5 years ago

Hi @SimenHyllHansen,

Sorry for the trouble. I added that error as a check recently to avoid strange behavior when the input was wrong, but perhaps that error is being thrown when it shouldn't. Can you send me an example data set and code so I can reproduce the error? You can use save to send me a file with the object right before plotting. If you don't want to attach the data set to this issue you can email me at zacharyfoster1989 gmail.com.

SimenHyllHansen commented 5 years ago

Thank you for your answer. I have now sent you the file via email.

zachary-foster commented 5 years ago

Hi @SimenHyllHansen,

I figured out the issue. When you filtered the taxa with taxa::filter_taxa, the per-taxon data in 'diff_table' was not filtered. Instead the rows that would have been removed were reassigned a taxon ID, so the 'diff_table' had more than one value per taxon for each comparison. This is the default behavior, since for non-per-taxon data, it is usually what the user wants. To filter per-taxon datasets with filter_taxa use reassign_obs = FALSE. In this case you can specify just the 'diff_table' like so:

obj %>%
  taxa::filter_taxa(taxon_ranks == "g", supertaxa = TRUE, reassign_obs = c(diff_table = FALSE)) %>%
  mutate_obs("cleaned_names", gsub(taxon_names, pattern = "\\[|\\]", replacement = "")) %>%
  taxa::filter_taxa(grepl(cleaned_names, pattern = "^[a-zA-Z]+$"), reassign_obs = c(diff_table = FALSE)) %>%
  heat_tree_matrix(data = "diff_table",
                   node_label = cleaned_names,
                   #node_size = n_obs, # number of OTUs
                   node_color = log2_median_ratio, # difference between groups
                   node_color_trans = "linear",
                   node_color_interval = c(-3, 3), # symmetric interval
                   edge_color_interval = c(-3, 3), # symmetric interval
                   node_color_range = diverging_palette(), # diverging colors
                   #node_size_axis_label = "OTU count",
                   node_size_axis_label = 1,
                   node_color_axis_label = "Log 2 ratio of median counts",
                   layout = "da", initial_layout = "re",
                   key_size = 0.70,
                   seed = 616)

It worked in the past before I put that check in, but now the function is more picky, since non-per-taxon data can cause other problems.

I hope that helps, let me known if you have questions.

SimenHyllHansen commented 5 years ago

Lifesaver! Thank you so much!