kdzimm / hierarchicell

An R package for simulating cell-type specific and hierarchical single-cell expression data
9 stars 6 forks source link

A permanent error #6

Open beginner984 opened 2 years ago

beginner984 commented 2 years ago

Hello

I have raw read counts of 3468 single cells from monocytes (I have already annotated my cell cluster). I extracted raw read counts from my Seurat data.

So this is how my data look like

> head(a[1:5,1:5])
                  CellID IndividualID FAM138A OR4F5 AL627309.1
1 AAACCAACATCGTTCT-1_1_1  PN0252_0005       0     0          0
2 AAACCGAAGCAATAGG-1_1_1  PN0252_0005       0     0          0
3 AAACCGCGTGGCTTCC-1_1_1  PN0252_0005       0     0          0
4 AAACGGATCCCGCATT-1_1_1  PN0252_0005       0     0          0
5 AAACGTACAACTAACT-1_1_1  PN0252_0005       0     0          0
> unique(a$IndividualID)
[1] "PN0252_0005" "PN0252_0008" "PN0252_0001" "PN0252_0002" "PN0252_0003" "PN0252_0004"
> dim(a)
[1]  3468 36602
> 

I tried your software but I get error

> data_summaries <- compute_data_summaries(a)
Computing sample means, dropout rates, and dispersion ... 
Computing final data summaries ... 
> power_hierarchicell(data_summaries, n_genes = 36602, n_per_group = 4)
Error in glm.fit(x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  : 
  NA/NaN/Inf in 'x'
>

I should mention I had two cancer patients "PN0252_0005" "PN0252_0008" and four controls

Do you know how I can solve this error please?

Thanks in advance

kdzimm commented 2 years ago

Quick question. Did you run the "filter_counts" function on your data first? Is it possible you have genes in your data that are entirely zero?

beginner984 commented 2 years ago

Quick question. Did you run the "filter_counts" function on your data first? Is it possible you have genes in your data that are entirely zero?

Thank you

I have tried filtering now but the sane error

> clean_expr_data <- filter_counts(a)
Filtering user input
Genes and cells have been filtered, ready for estimating parameters

> dim(clean_expr_data)
[1]  3468 25038
> 

After filtering 10000 genes were removed but the same error

> power_hierarchicell(data_summaries, n_genes = 100, n_per_group = 10)
Error in glm.fit(x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  : 
  NA/NaN/Inf in 'x'
> 

Please, do you have any suggestion?

Thank you so much

kdzimm commented 2 years ago

Based on the number of genes that were dropped with "gene_thresh=0" there is likely a lot of zero inflation in the data. What happens if you move the "gene_thresh" to 1 or 2? Might be worth using the cell_thresh too, but I imagine if these are already processed data you have removed low quality cells.

kdzimm commented 2 years ago

If this problem persists, I might ask that you email me (kdzimmer@wakehealth.edu) your data (post-filtering to reduce size) and I will try and troubleshoot for you.