johannesbjork / Longitudinal-gut-microbiome-changes-in-ICB-treated-advanced-melanoma

2 stars 1 forks source link

balance_df had much "NaN" because of zeros in profiles #2

Open HuaZou opened 4 months ago

HuaZou commented 4 months ago

I apologize for the inconvenience. I am truly grateful for your assistance and the updated profiles you have provided.

However, while using the codes from survival_analysis.R to replicate your results, I have encountered a significant issue. I noticed that the balance_value contains numerous NaN values, primarily due to the presence of numerous zeros in the denominator_taxa. This has raised a question for me: how were the final balance_values obtained in the paper, as they seemed to be devoid of any NaN values?

If you have the time, I would greatly appreciate your insight into this matter and any assistance you could offer to help me resolve this issue. Thank you again for your help and patience.

The following codes (from survival_analysis.R) are here:

# sgbs_baseline_survival.csv was the updated version (6/17/2024)
ftbl <- read.csv("sgbs_baseline_survival.csv", check.names = F, row.names = 1)
mdat <- read.csv("mdat_baseline_survival.csv", check.names = F, row.names = 1)

# Select SGBs for the "Longitudinal" balance
numerator_taxa <- c("f__Ruminococcaceae | s__Agathobaculum_butyriciproducens | t__SGB14993_group",
                    "f__Peptostreptococcaceae | s__Intestinibacter_bartlettii | t__SGB6140",
                    "f__Lachnospiraceae | s__Dorea_sp_AF24_7LB | t__SGB4571",
                    "f__Lactobacillaceae | s__Lactobacillus_gasseri | t__SGB7038_group",
                    "f__Lachnospiraceae | s__Lacrimispora_celerecrescens | t__SGB4868")

numerator_taxa_index <- match(numerator_taxa, colnames(ftbl))

denominator_taxa <- c("f__Ruminococcaceae | s__Ruthenibacterium_lactatiformans | t__SGB15271",
                      "f__Ruminococcaceae | s__Ruminococcaceae_unclassified_SGB15265 | t__SGB15265_group",
                      "f__Prevotellaceae | s__Prevotella_copri_clade_A | t__SGB1626",
                      "f__FGB602 | s__GGB1420_SGB1957 | t__SGB1957")

denominator_taxa_index <- match(denominator_taxa, colnames(ftbl))

balance_df <- mdat %>% 
  dplyr::mutate(balance_value = NA) %>%
  tibble::rownames_to_column("SampleID") %>%
  dplyr::filter(SampleID %in% rownames(ftbl)) %>%
  tibble::column_to_rownames("SampleID")

# Compute balance
for(sample in rownames(ftbl)){

  balance_df[sample, ]$balance_value <- 
    log(exp(mean(as.numeric(log(ftbl[sample, numerator_taxa_index]))))) - 
    log(exp(mean(as.numeric(log(ftbl[sample, denominator_taxa_index])))))  
}
johannesbjork commented 4 months ago

I have now made a commit that should resolve the issue. Please also use the newly updated version of the data on OSF ('mdat_baseline_survival.csv' and 'sgbs_baseline_survival.csv') which should correspond to the data used for the survival analysis. Please let me know how it works out. Thanks

HuaZou commented 4 months ago

I would like to extend my sincere thanks for the invaluable assistance you provided yesterday. Your support has been instrumental in the progress. However, I have encountered two queries that require further clarification.

Firstly, there seems to be a discrepancy in the number of patients represented between the risk table and the Kaplan-Meier (KM) plot. The risk table indicates 147 patients, whereas the KM plot displays 146 (T0 (N=146)) in the figure 2d from the paper. N=146 aligns with the updated 'mdat_baseline_survival.csv'. This inconsistency needs to be addressed.

Secondly, regarding the numeric values presented in the legend—35.4 and 28.4. I am curious about their significance. Initially, I presumed these figures represented time correlated with the median survival rate, but they do not appear to be so. Could you please elucidate the formula or the meaning behind these numeric values?

Thank you once again for your attention to these matters. I look forward to your insights.

The following plots was Fig 2d of paper: vertical line indicated the legend—35.4 and 28.4 and horizontal line indicated the 50% survival rate. image

johannesbjork commented 4 months ago

As this is venturing outside of specific code issues, please send the above and other non-code related comments/concerns to my bjork.johannes@gmail.com