jackbibby1 / SCPA

R package for pathway analysis in scRNA-seq data
https://jackbibby1.github.io/SCPA/
GNU General Public License v3.0
57 stars 6 forks source link

Must be even number of elements/adding ghost value error #78

Closed hbussan closed 1 month ago

hbussan commented 1 month ago

The specific error you're running into Warning in initialize(value, ...) :
There must be an even number of elements Adding a ghost value

This error throws just as "Performing a two sample analysis with SCPA" begins. It spams continually until I shut down the process.

To Reproduce The code to reproduce the error e.g.

pathways <- msigdbr("Mus musculus","C2") %>% format_pathways()

cluster5 <- seurat_extract(scaorta_75,meta1="celltype.geno", value_meta1 = "5_Rey05")

cluster6 <- seurat_extract(scaorta_75,meta1="celltype.geno", value_meta1 = "5_Rey06")

scpa_out_4 <- compare_pathways(samples = list(cluster5,cluster6), pathways = pathways, min_genes = 15, max_genes = 500)

Expected behavior I have been running SPCA on my clusters from a Seurat object. I have sucessfully done this with multiple other clusters using the code above. However, when I do it on on certain specific clusters, I get this error.

Additional context I am using Seruat v5.

jackbibby1 commented 1 month ago

Hi,

Thanks for the info. It sounds like it's an issue with one of your expression matrices. Could you show me the output of:

dim(cluster5)
dim(cluster6)
cluster5[1:5, 1:5]
cluster6[1:5, 1:5]

Thanks Jack

hbussan commented 1 month ago

Hello,

Thank you for your quick response and for the program - its been extremely helpful.

I have attached the output of the above code with a cluster that worked and a cluster that didnt.

Cluster that threw the error output:

[1] 21976   246
[1] 21976   133
        AAACCCATCTCTGCTG-1_1 AAACGCTCATAGGTTC-1_1 AAAGTGACACTAACGT-1_1
Xkr4                   0.001                0.001              0.39682
Gm1992                 0.001                0.001              0.00100
Gm19938                0.001                0.001              0.00100
Rp1                    0.001                0.001              0.00100
Sox17                  0.001                0.001              0.00100
        AACAGGGGTATGCGGA-1_1 AACCTGACAGGTCCCA-1_1
Xkr4                   0.001                0.001
Gm1992                 0.001                0.001
Gm19938                0.001                0.001
Rp1                    0.001                0.001
Sox17                  0.001                0.001
        AAAGGGCTCGGAGATG-1_2 AACGAAAAGCTGAAAT-1_2 AAGAACATCCAACCGG-1_2
Xkr4                   0.001                0.001                0.001
Gm1992                 0.001                0.001                0.001
Gm19938                0.001                0.001                0.001
Rp1                    0.001                0.001                0.001
Sox17                  0.001                0.001                0.001
        AAGACTCCATGGAACG-1_2 AAGCATCCATGATGCT-1_2
Xkr4                   0.001                0.001
Gm1992                 0.001                0.001
Gm19938                0.001                0.001
Rp1                    0.001                0.001
Sox17                  0.001                0.001

Cluster that did not throw error output:

[1] 21976   420
[1] 21976   254
        AAACCCATCTGAGGCC-1_1 AAACGAACAAACGAGC-1_1 AAAGGGCTCATCGCTC-1_1
Xkr4                   0.001                0.001                0.001
Gm1992                 0.001                0.001                0.001
Gm19938                0.001                0.001                0.001
Rp1                    0.001                0.001                0.001
Sox17                  0.001                0.001                0.001
        AAAGTCCAGTGCTAGG-1_1 AACAAAGGTTGCTCCT-1_1
Xkr4                   0.001                0.001
Gm1992                 0.001                0.001
Gm19938                0.001                0.001
Rp1                    0.001                0.001
Sox17                  0.001                0.001
        AAACGCTGTGTAGTGG-1_2 AAAGTGACATTGACCA-1_2 AAATGGACAAATTGGA-1_2
Xkr4                   0.001                0.001                0.001
Gm1992                 0.001                0.001                0.001
Gm19938                0.001                0.001                0.001
Rp1                    0.001                0.001                0.001
Sox17                  0.001                0.001                0.001
        AACAACCAGACGCAGT-1_2 AACCAACAGTGAACAT-1_2
Xkr4                   0.001                0.001
Gm1992                 0.001                0.001
Gm19938                0.001                0.001
Rp1                    0.001                0.001
Sox17                  0.001                0.001
jackbibby1 commented 1 month ago

Glad it's useful.

Hmm, interesting. It's tough to say from that. Is there any chance you can share a subset of your data so I can reproduce the error?

hbussan commented 1 month ago

I can do that - what would be most helpful? I assumed an RDS file but those aren't supported by github.

Also - I did try subsetting clusters that worked and clusters that didn't work. After subsetting the Seurat object, SCPA generated the same error as above for all clusters, even those that worked in the non-subsetted dataset. I did subset my data early on, as I needed to remove clusters that didn't carry the marker I had sorted the cells on - so maybe this is part of the issue?

jackbibby1 commented 1 month ago

I'm not sure the subsetting should be an issue. As long as you have two expression matrices, things should be good. If you want to send it directly to me or upload the RDS file to some cloud service like Google Drive and the share it with my email: jackbibby1@outlook.com, I'll take a look

hbussan commented 1 month ago

Hello, I'll send you an e-mail. Thanks!!

jackbibby1 commented 1 month ago

Hi @hbussan,

Sorry -- your email went to my junk and I missed it for a few days. This is what I ran on my end

# load in packages
library(SCPA)
library(msigdbr)

# wdir
setwd("~/Downloads/SCPA output/")

# get files
files <- list.files(".", full.names = TRUE)
files
[1] "./0_05.csv.gz" "./0_06.csv.gz" "./4_05.csv.gz" "./4_06.csv.gz"

# read in files
df <- lapply(files, function(x){
  read.csv(x, row.names = "X")
})

# get pathways
pathways <- msigdbr("Mus musculus", "H") %>%
  format_pathways()

# compare files that are throwing the warnings
scpa_out <- compare_pathways(samples = df[3:4], 
                             pathways = pathways)

Using single core processing. Specify 'parallel = TRUE' and `cores = x` arguments for parallel processing

Cell numbers in population 1 = 246
Cell numbers in population 2 = 133
- If greater than 500 cells, these populations will be downsampled

All 50 pathways passed the min/max genes threshold

Calculating pathway fold changes...

Performing a two-sample analysis with SCPA...
  |==================================================| 100%
There were 50 or more warnings (use warnings() to see the first 50)

I tracked down the warning message and it's coming from the distancematrix() function in nbpMatching, which is used by our multicross::mcm() function. It's warning you that it's adding a ghost column to the matrix to make sure the number of columns in the matrix is even, so you can just ignore this -- everything should be running fine. I've checked the output from the files you sent and things are all looking OK.

Just let me know if you've got any more questions

Jack

hbussan commented 1 month ago

Thank you for all your help!