grunwaldlab / metacoder

Parsing, Manipulation, and Visualization of Metabarcoding/Taxonomic data
http://grunwaldlab.github.io/metacoder_documentation
Other
134 stars 28 forks source link

Getting incorrectly formated results with zero_low_counts #356

Open dhadsell opened 9 months ago

dhadsell commented 9 months ago

Hello, I am trying to run Metacoder on some 16s data from water samples that was obtained through RTL genomics. I import 2 text files. The first contains the otu with 30 samples x 6,672 otu. the second is the meta data file with sample ids in the first column and trt groups in the second. I have attached samples of both files along with the script that I used. It seems that when I first use the "parse_tax_data" comand the output appears correctly formated. But when I attempt to run the object through the "zero_low_counts" command the resulting tax_data loses the otu_id and lineage columns and the remaining sample columns are converted from to . It turns out that the obj$data$tax_data[ , water_samples$sample_id] == 0 on the full data set it generates a large matrix whereas in your example analysis with human data the object is a "logi" object. If I take only the first 100 lines of the otu data then a logi object is generated but the below error still happens when I try to run the rowSums comand on obj$data$tax_data[ , water_samples$sample_id]) == 0 i get the following error "Error in rowSums(obj$data$tax_data[, water_samples$sample_id]) : 'x' must be numeric". I have tried numerous ways to correct this issue but to no avail. Can you tell me what the problem might be? any help please? Hadsell_script_for_running_Metacoder_w_RTL_genomics_16s.txt Caddo_metadata_DH_102423.txt sample_Trimmed_otu_for_metacoder_minus_nocall_plus_root_102423.txt

zachary-foster commented 8 months ago

Hello,

I ran your code with your example data and did not get that error. Does it only happen with the full dataset?

library(metacoder)

water_otu <- read.delim("sample_Trimmed_otu_for_metacoder_minus_nocall_plus_root_102423.txt", header =T, sep ='\t', check.names = FALSE)
water_samples <- read.delim("Caddo_metadata_DH_102423.txt", header = T, sep = '\t')

obj <- parse_tax_data(water_otu,
                      class_cols = "lineage", # the column that contains taxonomic information
                      class_sep = ";", # The character used to separate taxa in the classification
                      class_regex = "^(.+)__(.+)$", # Regex identifying where the data for each taxon is
                      class_key = c(tax_rank = "info", # A key describing each regex capture group
                                    tax_name = "taxon_name"))

obj$data$tax_data <- zero_low_counts(obj, data = "tax_data", min_count = 5)

test <- obj$data$tax_data[ , water_samples$sample_id] == 0
no_reads <- rowSums(obj$data$tax_data[, water_samples$sample_id]) == 0
dhadsell commented 8 months ago

Hello Zachary, So it turns out that in the I attached the wrong version of the meta-data file.

The correct file is now attached This file is a dataframe with two variables and 30 observations The first variable is “sample_id” and the second is “group”

The original data file had 10 variables and did not contain a “sample_id” column

With the correct metadata file I get the error regardless of whether I use the small sample otu dataset or the full otu dataset.

From: Zachary Foster @.> Sent: Wednesday, October 25, 2023 2:59 PM To: grunwaldlab/metacoder @.> Cc: Hadsell, Darryl L. @.>; Author @.> Subject: Re: [grunwaldlab/metacoder] Getting incorrectly formated results with zero_low_counts (Issue #356)

CAUTION: This email is not from a BCM Source. Only click links or open attachments you know are safe.


Hello,

I ran your code with your example data and did not get that error. Does it only happen with the full dataset?

library(metacoder)

water_otu <- read.delim("sample_Trimmed_otu_for_metacoder_minus_nocall_plus_root_102423.txt", header =T, sep ='\t', check.names = FALSE)

water_samples <- read.delim("Caddo_metadata_DH_102423.txt", header = T, sep = '\t')

obj <- parse_tax_data(water_otu,

                  class_cols = "lineage", # the column that contains taxonomic information

                  class_sep = ";", # The character used to separate taxa in the classification

                  class_regex = "^(.+)__(.+)$", # Regex identifying where the data for each taxon is

                  class_key = c(tax_rank = "info", # A key describing each regex capture group

                                tax_name = "taxon_name"))

obj$data$tax_data <- zero_low_counts(obj, data = "tax_data", min_count = 5)

test <- obj$data$tax_data[ , water_samples$sample_id] == 0

no_reads <- rowSums(obj$data$tax_data[, water_samples$sample_id]) == 0

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_grunwaldlab_metacoder_issues_356-23issuecomment-2D1779963587&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=XTCyyeX5o3ZkNVzfALlNF9OFiwkE3rMWvWV85Km9iXs&m=ncNEDOJRZpCTjz6UowYa5AxcxtUOY7O1qUwc753ep1-jHHD5aRp35TCqm0JidMI8&s=N9Z3-ULOFlzAdXMufkV4vmq6etXMdflVsHB1-lbxqD8&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AK2FDPQ37D52YZVXXTRHFDTYBFVR3AVCNFSM6AAAAAA6OKQODSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZZHE3DGNJYG4&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=XTCyyeX5o3ZkNVzfALlNF9OFiwkE3rMWvWV85Km9iXs&m=ncNEDOJRZpCTjz6UowYa5AxcxtUOY7O1qUwc753ep1-jHHD5aRp35TCqm0JidMI8&s=c0Qs2dCZzvFkHPqSrB8nu432s2xYzbVgrb3UkXN6RhY&e=. You are receiving this because you authored the thread.Message ID: @.**@.>>

sample_id group 1 Deep_Open 10 Deep_Matted 11 Shallow_M 12 Deep_Matted 13 Shallow_M 14 Deep_Open 15 Deep_Mattted 16 Deep_Mattted 17 Shallow_M 18 Shallow_Open 19 Shallow_Matted 2 Deep_Matted 20 Deep_Matted 21 Deep_Open 22 Deep_Matted 23 Deep_Open 24 Deep_Open 25 Deep_Matted 26 Deep_Open 27 Shallow_Matted 28 Deep_Open 29 Deep_Matted 3 Deep_Open 30 Deep_Open 4 Shallow_Matted 5 Deep_Matted 6 Shallow_Open 7 Shallow_Open 8 Deep_Open 9 Shallow_Light

dhadsell commented 8 months ago

I noticed that the dataframes in your example are tibbles. I also just tried converting my two dataframes to tibbles but that does not seem to be the answer.

here is the correct metadata file

Caddo_meta_102423.txt

zachary-foster commented 8 months ago

Thanks for the new file! I can reproduce the error now.

The issue is that your sample IDs are being read in as integers instead of strings since they are all numeric. This is one reason ID that start with a non-number character are generally better. At this line:

no_reads <- rowSums(obj$data$tax_data[, water_samples$sample_id]) == 0

water_samples$sample_id is being treated as column indexes instead of column names, so before that happens you need to add water_samples$sample_id <- as.character(water_samples$sample_id)

Here is the updated script that circumvents this error:

library(metacoder)
#> This is metacoder version 0.3.6 (stable)

water_otu <- read.delim("sample_Trimmed_otu_for_metacoder_minus_nocall_plus_root_102423.txt", header =T, sep ='\t', check.names = FALSE)
#> Warning in file(file, "rt"): cannot open file
#> 'sample_Trimmed_otu_for_metacoder_minus_nocall_plus_root_102423.txt': No such
#> file or directory
#> Error in file(file, "rt"): cannot open the connection
water_samples <- read.delim("Caddo_meta_102423.txt", header = T, sep = '\t')
#> Warning in file(file, "rt"): cannot open file 'Caddo_meta_102423.txt': No such
#> file or directory
#> Error in file(file, "rt"): cannot open the connection
water_samples$sample_id <- as.character(water_samples$sample_id)
#> Error in eval(expr, envir, enclos): object 'water_samples' not found

obj <- parse_tax_data(water_otu,
                      class_cols = "lineage", # the column that contains taxonomic information
                      class_sep = ";", # The character used to separate taxa in the classification
                      class_regex = "^(.+)__(.+)$", # Regex identifying where the data for each taxon is
                      class_key = c(tax_rank = "info", # A key describing each regex capture group
                                    tax_name = "taxon_name"))
#> Error in eval(expr, envir, enclos): object 'water_otu' not found

obj$data$tax_data <- zero_low_counts(obj, data = "tax_data", min_count = 5)
#> Error in eval(expr, envir, enclos): object 'obj' not found

test <- obj$data$tax_data[ , water_samples$sample_id] == 0
#> Error in eval(expr, envir, enclos): object 'obj' not found
no_reads <- rowSums(obj$data$tax_data[, water_samples$sample_id]) == 0
#> Error in eval(expr, envir, enclos): object 'obj' not found
no_reads
#> Error in eval(expr, envir, enclos): object 'no_reads' not found

Created on 2023-11-02 with reprex v2.0.2