ncborcherding / scRepertoire

A toolkit for single-cell immune profiling
https://www.borch.dev/uploads/screpertoire/
MIT License
311 stars 54 forks source link

Codes to generate tables that summarise clonal expansion and sharing for a metadata column in Seurat object #427

Closed denvercal1234GitHub closed 1 month ago

denvercal1234GitHub commented 1 month ago

Hi, Nick,

I hope all is well.

I am revisiting some old data with scRepertoire and unfortunately could not find a robust way to get a table of both the number of clones that shared between different cluster ID and the sequence of these clones in my Seurat object that has the clone info.

Previously, I had used clonalNetwork() exportTable, which gave me the cluster IDs that have any clone shared, but it does not give the number of the clones that are shared (only the weight, which only allows me to rank how much sharing but then I cannot tell how many of clones shared in that weight coming from unique clone versus duplicated).

When, I used clonalNetwork() exportClone together with filter.identity, it give me the sequence of the clones shared from that cluster identity, but then I have no way to know which sequence belong to which cluster ID that this identity shares with. lol.

Do you recommend just summarise manually from the Seurat meta.data?

Thank you for your help!

ncborcherding commented 1 month ago

Hey Quang,

Thanks for reaching out - based on my understanding of your needs - you'll have to manually summarize the meta data. But I would use the exportTable() output to get the unique clones.

This could be a straight forward dplyr summarize() call with grouping by cluster and your clone call (CTaa for instance). You can then filter the results by any CTaa that appears more than once in the table.

Good luck and hope that helps, Nick

denvercal1234GitHub commented 1 month ago

Thanks Nick! Always very prompt and helpful.

With your pointers, I wrote these 2 rough functions. Not the best, but hope they are helpful to others who prefer numerical summaries. They generate tables that summarize the numerical values of clonal expansion and sharing between clusters (with specification of the clone sequences) taking the input after attaching TCR info from scRepertoire onto the Seurat metadata.

For anyone interested and crazy about knowing exactly the numbers, check out https://github.com/denvercal1234GitHub/TCRClone_NumericalSummary

AIMS

To be able to generate tables that summarize the numerical values of clonal expansion and sharing after performing TCR clonal analysis using scRepertoire or other packages on the Seurat object.

PREPARE INPUT FROM SEURAT OBJECT WITH TCR ASSIGNMENT

For 2 aims (assess TCR clonal expansion and clonal sharing numerically), it starts with a Seurat object after scRepertoire().

# Load necessary library
library(dplyr)

## Prepare input for expansion_summary() function and analyze_Ctaa_sharing() from a Seurat object that has CTaa info attached from scRepertoire package with cells (rows) without TCR info removed
## This input requires the presence of 2 columns in the metadata of Seurat: CTaa is the clone sequences; cluster_ID is cluster assignment from Seurat for each clone

## SeuratObj@meta.data -> SeuratObj_metaData

####EXAMPLE TO ILLUSTRATE THE FUNCTIONS

# Sample data frame as example for a metadata of Seurat Object 
SeuratObj_metaData <- data.frame(
  cluster_ID = c("T25", "T25", "T25","T25","T25", "T25","T25","T25","T25","T26","T26", "T26","T26", "T27", "T27", "T27", "T27"),
  CTaa = c("K","K","A", "A", "A","B", "B","B","B","Q", "D","B","B", "C", "E", "F", "A"),
  stringsAsFactors = FALSE
)

#### This example has cluster T25 having clone "A" and clone "B" that expanded at least once! 
#### T26 has clone "B" expanded twice (which is counted as 1 clone being expanded). 
#### T27 clones are all single clones, i.e., 0 clone expanded. 
SeuratObj_metaData
Screenshot 2024-10-25 at 02 35 05

AIM 1: SUMMARIZE CLONAL EXPANSION FOR EACH CLUSTER

expansion_summary() will summarise for each cluster, how many clones a cluster has that expanded at least X times. EX: If Expanded_Count = 1, it means this cluster has 1 clone that expanded at least once. If Expanded_Count = 2, it means this cluster has 2 clones that each of them had expanded at least once. And so on. Thus, the higher the Expanded_Count, the more expanded the cluster is as it means the cluster has more clones that at least expanded X times.

expansion_summary() need an input as a dataframe that counts the occurrence of each unique clone.

If wanting only the counts

## Step 1: Obtain frequency from SeuratObj_metaData
SeuratObj_metaData %>% group_by(CTaa, cluster_ID) %>% summarise(count = n()) %>% arrange(cluster_ID) -> SeuratObj_countOccurenceOfEachUniqueClone_df

## Step 2: Get unique cluster IDs
unique_clusters <- unique(SeuratObj_metaData$cluster_ID)

## Step 3: Run expansion_summary() 
expansion_summary <- lapply(unique_clusters, function(cluster) {
  cluster_data <- SeuratObj_countOccurenceOfEachUniqueClone_df %>%
    filter(cluster_ID == cluster)

  # Count how many times a clone expanded at least once (count > 1). Change per your own threshold
  expanded_count <- sum(cluster_data$count > 1)

  # Return a summary for this cluster
  data.frame(
    Cluster = cluster,
    Expanded_Count = expanded_count
  )
})

## Step 4: Combine the summaries into a single data frame
expansion_summary_df <- do.call(rbind, expansion_summary)

## Step 5: Print the summary table. 
#### This example has cluster T25 having clone "A" and clone "B" and clone "K" that expanded at least once! 
#### T26 has clone "B" expanded twice (which is counted as 1 clone being expanded). 
#### T27 clones are all single clones, i.e., 0 clone expanded. 
expansion_summary_df
Screenshot 2024-10-25 at 02 35 23

If wanting both the county and sequences of the expanded clones

## Step 1: Obtain frequency from SeuratObj_metaData
SeuratObj_metaData %>%
  group_by(CTaa, cluster_ID) %>%
  summarise(count = n()) %>%
  ungroup() %>%
  arrange(cluster_ID) -> SeuratObj_metaData

## Step 2: Get unique cluster IDs
unique_clusters <- unique(SeuratObj_metaData$cluster_ID)

## Step 3: Run expansion_summary to include CTaa values of clones that expanded
expansion_summary <- lapply(unique_clusters, function(cluster) {

  # Filter data for the current cluster
  cluster_data <- SeuratObj_metaData %>%
    filter(cluster_ID == cluster)

  # Filter for clones that expanded (count > 1)
  expanded_clones <- cluster_data %>%
    filter(count > 1) %>%
    dplyr::select(CTaa)  # Use dplyr::select to avoid conflict with other select methods

  # Count how many clones expanded at least once (count > 1)
  expanded_count <- nrow(expanded_clones)

  # Create a summary data frame with cluster, expanded count, and expanded CTaa values
  data.frame(
    Cluster = cluster,
    Expanded_Count = expanded_count,
    Expanded_Clones = paste(expanded_clones$CTaa, collapse = ", ")  # Paste expanded CTaa values as a comma-separated string
  )
})

## Step 4: Combine the summaries into a single data frame
expansion_summary_df <- do.call(rbind, expansion_summary)

# View the final expansion summary with expanded CTaa values
print(expansion_summary_df)
Screenshot 2024-10-25 at 15 49 34

AIM 2. SUMMARIZE WHICH CLONES FROM WHICH CLUSTERS SHARED WITH WHICH OTHER CLUSTERS AND WHICH CLONES NOT SHARED WITH ANY CLUSTERS

analyze_Ctaa_sharing() will export a list of 2 dataframes as results (ctaa_analysis_result)

To report how many clones that are unique (not shared with anyone) from a cluster, e.g., T1, and their clone sequences

ctaa_analysis_result$Unique %>% dplyr::filter(Cluster == "T1")

To report how many clone(s) that are shared from this cluster with any other clusters, the sequences of the clone(s), and occurrence of that clone in the "receiving" cluster

ctaa_analysis_result$Shared %>% dplyr::filter(Cluster == "T1")

# Function to analyze CTaa shared values and specify which clusters share them
analyze_Ctaa_sharing <- function(data) {
  # Step 1: Get unique clusters
  unique_clusters <- unique(data$cluster_ID)

  # Step 2: Initialize lists to store results
  shared_results <- list()
  unique_results <- list()

  # Step 3: Loop through each unique cluster
  for (cluster in unique_clusters) {
    # Get the CTaa values for the current cluster
    current_ctaa <- data %>%
      filter(cluster_ID == cluster) %>%
      pull(CTaa)

    # Initialize variables to track shared CTaa values
    shared_ctaa <- NULL

    # Step 4: Check against all other clusters
    for (other_cluster in unique_clusters) {
      if (cluster != other_cluster) {
        other_ctaa <- data %>%
          filter(cluster_ID == other_cluster) %>%
          pull(CTaa)

        # Find shared CTaa values
        shared_current <- intersect(current_ctaa, other_ctaa)
        if (length(shared_current) > 0) {
          shared_ctaa <- unique(c(shared_ctaa, shared_current))
        }
      }
    }

    # Find unique CTaa values for the current cluster
    unique_current <- setdiff(current_ctaa, shared_ctaa)

    # Store results for shared CTaa values
    if (length(shared_ctaa) > 0) {
      for (value in shared_ctaa) {
        # Create a list to store the names of clusters sharing the current CTaa value
        clusters_shared <- c(cluster)  # Start with the current cluster

        # Count occurrences across all clusters sharing the CTaa
        clone_count_in_receiving_end <- 0

        for (other_cluster in unique_clusters) {
          if (other_cluster != cluster) {
            count_in_other_cluster <- sum(data$CTaa == value & data$cluster_ID == other_cluster)
            if (count_in_other_cluster > 0) {
              clusters_shared <- c(clusters_shared, other_cluster)
              clone_count_in_receiving_end <- clone_count_in_receiving_end + count_in_other_cluster
            }
          }
        }

        # Create a data frame entry for the shared CTaa value
        shared_entry <- data.frame(
          Cluster = cluster,
          Ctaa = value,
          Clusters_Shared = paste(clusters_shared[clusters_shared != cluster], collapse = ", "),
          Clone_Count_in_receivingEnd = clone_count_in_receiving_end,  # Updated column name
          stringsAsFactors = FALSE
        )

        # Append to the list of shared results
        shared_results[[length(shared_results) + 1]] <- shared_entry
      }
    }

    # Store unique CTaa values for the current cluster
    if (length(unique_current) > 0) {
      for (value in unique_current) {
        # Count occurrences of the unique CTaa in the cluster
        count_unique <- sum(data$CTaa == value & data$cluster_ID == cluster)

        unique_entry <- data.frame(
          Cluster = cluster,
          Ctaa = value,
          Clone_Count_inClusterOfInterest = count_unique,
          stringsAsFactors = FALSE
        )

        unique_results[[length(unique_results) + 1]] <- unique_entry
      }
    }
  }

  # Step 5: Combine the shared results into a data frame
  shared_df <- do.call(rbind, shared_results)

  # Combine unique results into a data frame
  unique_df <- do.call(rbind, unique_results)

  return(list(Shared = shared_df, Unique = unique_df))
}

# Run the analysis
ctaa_analysis_result <- analyze_Ctaa_sharing(SeuratObj_metaData)

# View the shared results with cluster information (DOESN'T INCLUDE UNIQUE CLONES)
### Note:
### In forward direction: T25 share clone B with T26 and clone B appear in T26 twice. That is clone B expanded twice in T26. That is T25 only has 1 clone (clone B) shared and it share that with T26.  
### In reverse direction: T26 share clone B with T25 and clone B appear in T25 four times. That is clone B expanded 4 times in T25. If this clone is shared by more than 1 cluster, then it counts the number of clusters sharing the clone
ctaa_analysis_result$Shared 
Screenshot 2024-10-25 at 02 36 17

EX:

Screenshot 2024-10-25 at 03 29 42
### Can just do ctaa_analysis_result$Shared %>% dplyr::filter(Cluster == "T25") to show that how many clones T25 shared with other clusters. If this clone is shared by more than 1 cluster, then it counts the number of clusters sharing the clone
ctaa_analysis_result$Shared %>% dplyr::filter(Cluster == "T25")
Screenshot 2024-10-25 at 02 36 47
########### ONLY SHOW UNIQUE CLONES FOR EACH CLUSTER 
## Note: T25 has clone K that is not shared with anyone, and clone K appears twice in T25!
## Note: T26 has clone Q and D are not shared with any one, and clone Q appears once in T26, and clone D also appears once in T26. Hence, they are listed and counted as 1 each 
ctaa_analysis_result$Unique
Screenshot 2024-10-25 at 02 36 59
Qile0317 commented 1 month ago

@denvercal1234GitHub a bit of a shameless plug but I recommend you to check out the APackOfTheClones package which does some of what you ask for in its exported utilities.