rnabioco / djvdj

An R package to analyze single-cell V(D)J data
https://rnabioco.github.io/djvdj
Other
23 stars 4 forks source link

pct in plotgene useage is >100 #143

Closed Ahmedalaraby20 closed 4 months ago

Ahmedalaraby20 commented 4 months ago

Hey Rayn, So I used

xx <- PILTcells |>  calc_gene_usage( data_cols = "v_gene", return_df = TRUE ,cluster_col = "HTO_classification",chain = "TRB"  )
xx %>% group_by(HTO_classification) %>% summarise(sum_pct = sum(pct))

and I get

# A tibble: 2 × 2
  HTO_classification sum_pct
  <chr>                <dbl>
1 PILT-MHC              106.
2 PILT-WT               112.

So I do

xx<- xx %>% group_by(HTO_classification) %>% mutate(new_pct = freq / sum(freq)) 
xx %>% group_by(HTO_classification) %>% summarise(sum_pct = sum(new_pct))

and now I get

  HTO_classification sum_pct
  <chr>                <dbl>
1 PILT-MHC                 1
2 PILT-WT                  1

This might be due to some cells having more than 1 vgene.

I am still using meta.zip

sheridar commented 4 months ago

The 'pct' column is the percentage of cells that include the v_gene, so this is the expected behavior when you have some cells with multiple v_genes. If you remove cells with multiple v_genes, the 'pct' column will sum to 100%:

library(djvdj)
library(dplyr)

obj <- readRDS("meta.rds")

res <- obj |>
  filter(paired & n_chains == 2) |>
  calc_gene_usage(
    data_cols   = "v_gene",
    return_df   = TRUE,
    cluster_col = "HTO_classification",
    chain       = "TRB"
  )

res |>
  group_by(HTO_classification) |>
  summarize(sum_pct = sum(pct))