jdekanter / CHETAH

scRNA-seq cell type identification
GNU Affero General Public License v3.0
42 stars 9 forks source link

Error running CHETAHclassifier and on object creation with TSNE #16

Open ViriatoII opened 2 years ago

ViriatoII commented 2 years ago

Hi,

Your algorithm seems great but I can't get it to work with my own data. Can you understand what's wrong?

countma[1:4,1:4]

>            cell_0    cell_2   cell_3   cell_6
> HAL          0.1     0         0         0
> SDS          0       0.04      0         0
> CD1C         0       1.2       0.3       0
> MMP2         0       0         0         0.2

head(input_tsne)

>        tSNE_1   tSNE_2
> cell_0  31.18 -11.01
> cell_2  -1.689 21.967
> cell_3  33.55   1.03
> cell_6  33.50  -0.77
> cell_7 -30.67 -32.83
input <-  SingleCellExperiment(assays = list(counts = countma),
+                               reducedDims = SimpleList(TSNE = input_tsne))

Error in vapply(value, vdimfun, 0L) : values must be length 1, but FUN(X[[1]]) result is length 0

If I remove the reducedDims parameter, Chetah starts working but also gives an error:

input <- CHETAHclassifier(input = input, ref_cells = reference) 

Preparing data....

Error in ref_means[[type_sl]] <- new : attempt to select less than one element in integerOneIndex

jdekanter commented 2 years ago

Hi Viriatoll,

Thanks for using CHETAH. Your error is not familiar to me unfortunately. The first error is a problem with the SingleCellExperiments package. Googling does not immediately give an obvious answer. I also am not able to reproduce the error with the info you provide here (running R 4.1.2). What R version, CHETAH version and SingleCellExperiments version are you running?

For the error in CHETAH: Your code for creating the input works for me with similar test data. Also, when removing the reducedDims from actual input data, CHETAH runs fine for me. A couple of checks: does CHETAH/SingleCellExperiments run for you with the provided example data (4.2/5.1 in the vignette), also when not specifying the reducedDims? Also, what is the reference data you use? Dit you specify the "colData = DataFrame(celltypes = X))" part?

A couple of other checks you might do. As both packages gives errors, it might just have something to do with the type of the input data. are "countma" and "input_tsne" numeric matrices/Matrices and not data.frames? Are all row.names in the input_tsne the same as the col.names in countma? Are there any NA values in one of the two matrices?

Let me know if any of this helped you, or what the outcomes were, then I could think along with you!

ViriatoII commented 2 years ago

Hi JK,

Thank you for the prompt response! So my R version is 4.1.2 (2021-11-01) My CHETAH version is 1.0.8

The tutorial data works fine, including without the reducedDims parameter.

Here is the comparison of tutorial inputs vs my inputs:

 > counts_melanoma [1:3,1:3] ; countma [1:3,1:3]
3 x 3 sparse Matrix of class "dgCMatrix"
      mel_cell1 mel_cell2 mel_cell3
ELMO2         .    .              .
PNMA1         .    4.3553         .
MMP2          .    .              .
3 x 3 sparse Matrix of class "dgCMatrix"
            cell_0   cell_2    cell_3
HAL          .         .         .
SDS          .         .         .
CD1C         .         .         .
  > tsne_melanoma [1:3,1:2]   ; input_tsne[1:3,1:2] 
              tSNE_1    tSNE_2
mel_cell1  4.5034553 13.596680
mel_cell2 -4.0025667 -7.075722
mel_cell3  0.4734054  9.277648
             tSNE_1     tSNE_2
cell_0 31.183132 -11.013281
cell_2 -1.689497  21.967064
cell_3 33.530225   1.036852

   >str(tsne_melanoma); str(input_tsne)

 num [1:150, 1:2] 4.503 -4.003 0.473 3.22 -0.335 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:150] "mel_cell1" "mel_cell2" "mel_cell3" "mel_cell4" ...
  ..$ : chr [1:2] "tSNE_1" "tSNE_2"
'data.frame':   14276 obs. of  2 variables:
 $ tSNE_1: num  31.18 -1.69 33.53 33.56 -30.65 ...
 $ tSNE_2: num  -11.013 21.967 1.037 -0.754 -32.856 ...

Actually my input_tsne is a dataframe while the tutorial tsne is not. It's a numerical matrix? (I'm more of a Python guy)

Colnames of one correspond to row names of the other:

  > sum(colnames(countma) == rownames(input_tsne))
[1] 14276
  > length(colnames(countma) == rownames(input_tsne))
[1] 14276

The reference is also a dgCMatrix :

  >ref[1:3,1:3]
3 x 3 sparse Matrix of class "dgCMatrix"
           1        2        3
CD9 .        6.928641 .       
CD4 6.756678 .        7.336448
DCN .        .        .       

Everything is being called like this:


## Make SingleCellExperiments
  > reference <- SingleCellExperiment(assays = list(counts = ref),
  >                               colData = DataFrame(celltypes = ref_labs))

  > input <-  SingleCellExperiment(assays = list(counts = countma)) #,         #commented out
  >                   # reducedDims = SimpleList(TSNE = input_tsne))

## Run CHETAH
  > input <- CHETAHclassifier(input = input, ref_cells = reference)
jdekanter commented 2 years ago

Hi ViriatoIl,

I can indeed replicate your error when I transform my input_tsne into a data.frame. Can you please try the following code?

 > input <-  SingleCellExperiment(assays = list(counts = countma)),
 >                                                      reducedDims = SimpleList(TSNE = as.matrix(input_tsne)))

That should work!

ViriatoII commented 2 years ago

Thank you JK, but the main problem remains: the CHETAHclassifier gives error with or without this input_tsne.


> ## Run CHETAH
> input <- CHETAHclassifier(input = input, ref_cells = reference)
Preparing data....    

Error in ref_means[[type_sl]] <- new : 
  attempt to select less than one element in integerOneIndex
jdekanter commented 2 years ago

Hi! Could it be that one of your reference names is "0"? This is what could give the " attempt to select less than one element in integerOneIndex" error. I had never taken the option of using numbers as reference cell type names into account, therefore this error could occur.

If your "celltypes" vector is a numeric vector please see if this works:

  > reference <- SingleCellExperiment(assays = list(counts = ref),
  >                               colData = DataFrame(celltypes = paste0("ref_type", ref_labs)))
  > input <-  SingleCellExperiment(assays = list(counts = countma)) ,
  >                                                      reducedDims = SimpleList(TSNE = as.matrix(input_tsne)))
  > input <- CHETAHclassifier(input = input, ref_cells = reference)
ViriatoII commented 2 years ago

Hey,

So I don't have any 0 or "0" column name

> head(colnames(ref))
[1] "1" "2" "3" "4" "5" "6"
> sum(colnames(ref)=="0")
[1] 0

Your code suggestion seems to solve the error but leads to a different one.

> input <- CHETAHclassifier(input = input, ref_cells = reference)
Preparing data....    
Running analysis... 
Error in ref_profiles[genes, type, drop = FALSE] : 
  subscript out of bounds