zdk123 / SpiecEasi

Sparse InversE Covariance estimation for Ecological Association and Statistical Inference
GNU General Public License v3.0
196 stars 68 forks source link

Identifying Interactions Across Data Modalities (Potential New Way to Apply SpiecEasi) #271

Open AntonKjellberg opened 2 months ago

AntonKjellberg commented 2 months ago

Hi,

I want to share a new way to apply SpiecEasi and gather any input or critiques from the community. I’m working with two datasets collected at the same time point:

  1. A profile of 20 mucosal immune markers.
  2. 16S metagenomics of hypopharyngeal samples. For this analysis, I’ve limited the metagenomic data to the top 30 bacteria by abundance.

In my dataset, I observe: • Strongly correlated groups of bacteria and likewise for cytokines • Many significant correlations between certain bacteria and immune markers, with many bacteria displaying similar responses.

I hypothesize that the immune response to particular bacteria may be more heterogeneous than what pure correlations suggest. The underlying correlation structures within each dataset might explain the observed similarity in bacterial responses. My goal is to identify which bacteria might be driving the observed shifts in immune tone using SpiecEasi:

  1. I applied CLR transformation separately to both the bacterial count data (top 30 bacteria) and the 20 immune markers
  2. I merged the transformed data into a single dataframe.
  3. I then used this dataframe in a forked version of SpiecEasi (without the normalization function), setting pulsar.params = list(thresh = 1) to retrieve all interaction weights.
  4. Finally, I visualized the results through a heatmap of the weights.

The results appear promising: • SpiecEasi correctly identified that the datasets were individually compositional, applying negative weights to interactions within the same dataset. This is ideal for focusing on the inter-dataset interactions. • Some results make biological sense: for instance, one of the strongest positive weights is between Staphylococcus aureus and IL-17, a well-documented interaction.

heatmap1.pdf heatmap2.pdf

I recognize that including all weights could be statistically challenging to interpret, but this isn’t the goal here. I already prove that strong associations in overall interaction patterns between the datasets. My primary aim is to gain additional insight into which bacteria may be driving specific changes in immune tone.

Here's some code:

# CLR-Transform them and add Species names as rows
sp <- prune_taxa(names(sort(taxa_sums(subset_samples(phy, Time == '1m')), decreasing = T)[1:30]), 
                 subset_samples(phy_counts, Time == '1m')) %>% 
  microbiome::transform('clr') %>% tax_table() %>% as_tibble() %>% .$Species
otu <- prune_taxa(names(sort(taxa_sums(subset_samples(phy, Time == '1m')), decreasing = T)[1:30]), 
                 subset_samples(phy_counts, Time == '1m')) %>% microbiome::transform('clr') %>% 
  otu_table %>% as.data.frame() 
rownames(otu) <- sp

# Add CLR-transformed cytokines
otu <- otu %>% rbind(t(get_variable(subset_samples(phy, Time == '1m'))[3:22])) %>% select_if(~ !any(is.na(.)))

# Perform SpiecEasi (Forked version without normalization function)
se_all <- SpiecEasi::spiec.easi(otu, method='mb', lambda.min.ratio=1e-2, nlambda=40, pulsar.params = list(thresh = 1))

# Obtain the weights
optbeta <- as.matrix(symBeta(getOptBeta(se_all)))

# Plot all intercations
pheatmap(optbeta,
         color = colorRampPalette(c("#0460bd", "white", '#d6023b'))(200),
         breaks = c(seq(min(optbeta), 0, length.out=ceiling(200/2) + 1), 
                    seq(max(optbeta)/200, max(optbeta), length.out=floor(200/2))),
         labels_row = rownames(otu),
         labels_col = rownames(otu))

# Plot cross-dataset intercations only
weights <- optbeta[1:30, (ncol(optbeta)-19):ncol(optbeta)]
rownames(weights) <- sp
colnames(weights) <- colnames(get_variable(phy))[3:22]
pheatmap(weights, 
         color = colorRampPalette(c("#0460bd", "white", '#d6023b'))(200),
         breaks = c(seq(min(weights), 0, length.out=ceiling(200/2) + 1), 
                    seq(max(weights)/200, max(weights), length.out=floor(200/2))))
zdk123 commented 2 months ago

If the cytokine measurements are compositional, this should be equivalent to running multi-domain SPIEC-EASI (which applies the CLR independently to each dataset as you did here).

On Tue, Sep 10, 2024, 5:09 AM Anton Kjellberg @.***> wrote:

Hi,

I want to share a new way to apply SpiecEasi and gather any input or critiques from the community. I’m working with two datasets collected at the same time point:

  1. A profile of 20 mucosal immune markers.
  2. 16S metagenomics of hypopharyngeal samples. For this analysis, I’ve limited the metagenomic data to the top 30 bacteria by abundance.

In my dataset, I observe: • Strongly correlated groups of bacteria and likewise for cytokines • Many significant correlations between certain bacteria and immune markers, with many bacteria displaying similar responses.

I hypothesize that the immune response to particular bacteria may be more heterogeneous than what pure correlations suggest. The underlying correlation structures within each dataset might explain the observed similarity in bacterial responses. My goal is to identify which bacteria might be driving the observed shifts in immune tone using SpiecEasi:

  1. I applied CLR transformation separately to both the bacterial count data (top 30 bacteria) and the 20 immune markers
  2. I merged the transformed data into a single dataframe.
  3. I then used this dataframe in a forked version of SpiecEasi (without the normalization function), setting pulsar.params = list(thresh = 1) to retrieve all interaction weights.
  4. Finally, I visualized the results through a heatmap of the weights.

The results appear promising: • SpiecEasi correctly identified that the datasets were individually compositional, applying negative weights to interactions within the same dataset. This is ideal for focusing on the inter-dataset interactions. • Some results make biological sense: for instance, one of the strongest positive weights is between Staphylococcus aureus and IL-17, a well-documented interaction.

heatmap1.pdf https://github.com/user-attachments/files/16942342/heatmap1.pdf heatmap2.pdf https://github.com/user-attachments/files/16942343/heatmap2.pdf

I recognize that including all weights could be statistically challenging to interpret, but this isn’t the goal here. I already prove that strong associations in overall interaction patterns between the datasets. My primary aim is to gain additional insight into which bacteria may be driving specific changes in immune tone.

Here's some code: `# Select the top 30 Bacteria by abundance (phy is relative abundance transformed, phy_counts contains counts) CLR-Transform them and add Species names as rows

sp <- prune_taxa(names(sort(taxa_sums(subset_samples(phy, Time == '1m')), decreasing = T)[1:30]), subset_samples(phy_counts, Time == '1m')) %>% microbiome::transform('clr') %>% tax_table() %>% as_tibble() %>% .$Species otu <- prune_taxa(names(sort(taxa_sums(subset_samples(phy, Time == '1m')), decreasing = T)[1:30]), subset_samples(phy_counts, Time == '1m')) %>% microbiome::transform('clr') %>% otu_table %>% as.data.frame() rownames(otu) <- sp Add CLR-transformed cytokines

otu <- otu %>% rbind(t(get_variable(subset_samples(phy, Time == '1m'))[3:22])) %>% select_if(~ !any(is.na(.))) Perform SpiecEasi (Forked version without normalization function)

se_all <- SpiecEasi::spiec.easi(otu, method='mb', lambda.min.ratio=1e-2, nlambda=40, pulsar.params = list(thresh = 1)) Obtain the weights

optbeta <- as.matrix(symBeta(getOptBeta(se_all))) Plot all intercations

pheatmap(optbeta, color = colorRampPalette(c("#0460bd", "white", '#d6023b'))(200), breaks = c(seq(min(optbeta), 0, length.out=ceiling(200/2) + 1), seq(max(optbeta)/200, max(optbeta), length.out=floor(200/2))), labels_row = rownames(otu), labels_col = rownames(otu)) Plot cross-dataset intercations only

weights <- optbeta[1:30, (ncol(optbeta)-19):ncol(optbeta)] rownames(weights) <- sp colnames(weights) <- colnames(get_variable(phy))[3:22] pheatmap(weights, color = colorRampPalette(c("#0460bd", "white", '#d6023b'))(200), breaks = c(seq(min(weights), 0, length.out=ceiling(200/2) + 1), seq(max(weights)/200, max(weights), length.out=floor(200/2)))) `

— Reply to this email directly, view it on GitHub https://github.com/zdk123/SpiecEasi/issues/271, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAUD2RE5OVEJMFZDVZVG7PDZV2ZN5AVCNFSM6AAAAABN6I7AT6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGUYTKOBWGQZTEMY . You are receiving this because you are subscribed to this thread.Message ID: @.***>