wguo-research / scCancer

A package for automated processing of single cell RNA-seq data in cancer
92 stars 39 forks source link

Cannot go through scAnnotation and the following scCombination #15

Closed wiceshine closed 4 years ago

wiceshine commented 4 years ago

Dear developers,

I found still I cannot go through the scCancer scripts after updated to 2.1.0. Now I am able to run through the scStatistics. But there are still errors in scAnnotation and I cannot continue the following scCombination. All my scripts and messages are listed below:

The full scripts:

## Loading libraries
library(DropletUtils)
library(scCancer)

## Create and set working directory 
female.03.savePath <- "~/PB Lab Dropbox/Fudan University/SIAT/DATA/RNA_seq/2019/sc_RNA_seq/Aged microglia/Cellranger v2/F03_PBS_MG_outs"
female.14.savePath <- "~/PB Lab Dropbox/Fudan University/SIAT/DATA/RNA_seq/2019/sc_RNA_seq/Aged microglia/Cellranger v2/F14_PBS_MG_outs"
female.24.savePath <- "~/PB Lab Dropbox/Fudan University/SIAT/DATA/RNA_seq/2019/sc_RNA_seq/Aged microglia/Cellranger v2/F24_PBS_MG_outs"

## scStatistics
#  Paths containing the cell ranger processed data
female.03.dataPath <- "~/PB Lab Dropbox/Fudan University/SIAT/DATA/RNA_seq/2019/sc_RNA_seq/Aged microglia/Cellranger v2/F03_PBS_MG_outs"
female.14.dataPath <- "~/PB Lab Dropbox/Fudan University/SIAT/DATA/RNA_seq/2019/sc_RNA_seq/Aged microglia/Cellranger v2/F14_PBS_MG_outs"
female.24.dataPath <- "~/PB Lab Dropbox/Fudan University/SIAT/DATA/RNA_seq/2019/sc_RNA_seq/Aged microglia/Cellranger v2/F24_PBS_MG_outs"

#  The sample names
female.03.sampleName <- "Female 03 MO"
female.14.sampleName <- "Female 14 MO"
female.24.sampleName <- "Female 24 MO"

#  The author name or a string used to mark the report.
authorName <- "Bo Peng"

#  Run scStatistics
female.03.stat.results <- runScStatistics(
  dataPath = female.03.dataPath,
  savePath = female.03.savePath,
  sampleName = female.03.sampleName,
  authorName = authorName,
  species = "mouse"

)

female.14.stat.results <- runScStatistics(
  dataPath = female.14.dataPath,
  savePath = female.14.savePath,
  sampleName = female.14.sampleName,
  authorName = authorName,
  species = "mouse"
)

female.24.stat.results <- runScStatistics(
  dataPath = female.24.dataPath,
  savePath = female.24.savePath,
  sampleName = female.24.sampleName,
  authorName = authorName,
  species = "mouse"
)

## scAnnotation
#  Paths containing the scStatistics results
female.03.statPath <- female.03.savePath
female.14.statPath <- female.14.savePath
female.24.statPath <- female.24.savePath

#  Run scAnnotation
anno.results <- runScAnnotation(
  dataPath = female.03.dataPath,
  statPath = female.03.statPath,
  savePath = female.03.savePath,
  authorName = authorName,
  sampleName = female.03.sampleName,
  geneSet.method = "average"   # or "GSVA"
)

anno.results <- runScAnnotation(
  dataPath = female.14.dataPath,
  statPath = female.14.dataPath,
  savePath = female.14.dataPath,
  authorName = authorName,
  sampleName = female.14.dataPath,
  geneSet.method = "average"   # or "GSVA"
)

anno.results <- runScAnnotation(
  dataPath = female.24.dataPath,
  statPath = female.24.dataPath,
  savePath = female.24.dataPath,
  authorName = authorName,
  sampleName = female.24.dataPath,
  geneSet.method = "average"   # or "GSVA"
)

# The paths of all sample's "runScAnnotation" results
single.savePaths <- c("~/PB Lab Dropbox/Fudan University/SIAT/DATA/RNA_seq/2019/sc_RNA_seq/Aged microglia/Cellranger v2/F03_PBS_MG_outs", "~/PB Lab Dropbox/Fudan University/SIAT/DATA/RNA_seq/2019/sc_RNA_seq/Aged microglia/Cellranger v2/F14_PBS_MG_outs", "~/PB Lab Dropbox/Fudan University/SIAT/DATA/RNA_seq/2019/sc_RNA_seq/Aged microglia/Cellranger v2/F24_PBS_MG_outs")
sampleNames <- c("Female 03 MO", "Female 14 MO", "Female 24 MO")    # The labels for all samples
savePath <- "~/PB Lab Dropbox/Fudan University/SIAT/DATA/RNA_seq/2019/sc_RNA_seq/Aged microglia/Cellranger v2/F03_PBS_MG_outs/comb.LIGER"       # A path to save the results
combName <- "microglia.age.comb"                 # A label of the combined samples
comb.method <- "LIGER"               # Integration methods ("NormalMNN", "SeuratMNN", "Harmony", "Raw", "Regression", "LIGER")

# Run scCombination
comb.results <- runScCombination(
  single.savePaths = single.savePaths, 
  sampleNames = sampleNames, 
  savePath = savePath, 
  combName = combName,
  authorName = authorName,
  comb.method = comb.method
)    

The error messages showed up when running scAnnotation:

[2020-05-12 10:13:24] START: RUN scAnnotation
[2020-05-12 10:13:24] -----: data preparation
[2020-05-12 10:13:45] -----: Seurat object creation
[2020-05-12 10:13:45] -----: highly variable genes
[2020-05-12 10:13:47] -----: data scaling
[2020-05-12 10:14:03] -----: PCA
[2020-05-12 10:14:12] -----: clustering
[2020-05-12 10:14:13] -----: tSNE
[2020-05-12 10:14:26] -----: UMAP
[2020-05-12 10:14:34] -----: differential expression analysis
[2020-05-12 10:14:55] -----: Seurat plotting and saving
When using repel, set xnudge and ynudge to 0 for optimal results
[2020-05-12 10:15:04] -----: Doublet score estimation
[2020-05-12 10:15:41] -----: TME cell types annotation
[2020-05-12 10:15:43] -----: cells malignancy annotation
Error in cnvList$expr.data[, ref.cellNames] : 下标出界
此外: There were 50 or more warnings (use warnings() to see the first 50)

After that, the scCombination cannot run as the error messages showed subsequently:

[2020-05-12 10:54:33] START: RUN ScCombination
[2020-05-12 10:54:33] -----: sample data combination
[1] "Female 03 MO"
Error in gzfile(file, "rb") : 无法打开链结
此外: Warning message:
In gzfile(file, "rb") :
  无法打开压缩文件'/Users/bopeng/PB Lab Dropbox/Fudan University/SIAT/DATA/RNA_seq/2019/sc_RNA_seq/Aged microglia/Cellranger v2/F03_PBS_MG_outs/expr.RDS',可能是因为'No such file or directory'

Thank you very much for your great help!

wiceshine commented 4 years ago

BTW, I also tested KC-example data, and found that KC-example dataset ran well. However, when I changed to my data, which was processed by cellranger 2, the error messages appeared. The returned messages are listed below for your information.

When running KC-example date set, everything went well:

[2020-05-12 12:20:20] START: RUN scStatistics
[2020-05-12 12:20:20] -----: data preparation
[2020-05-12 12:20:50] -----: cell calling
[2020-05-12 12:21:05] -----: nUMI & nGene distribution plot
[2020-05-12 12:21:06] -----: mito & ribo & diss distribution plot
[2020-05-12 12:21:08] -----: gene statistics
[2020-05-12 12:21:13] -----: gene proportion plot
[2020-05-12 12:21:27] -----: ambinet genes (SoupX)
[2020-05-12 12:21:27] -----: resutls saving
[2020-05-12 12:21:28] -----: report generating
[2020-05-12 12:21:56] END: Finish scStatistics

[2020-05-12 12:21:56] START: RUN scAnnotation
[2020-05-12 12:21:56] -----: data preparation
[2020-05-12 12:22:22] -----: Seurat object creation
[2020-05-12 12:22:25] -----: highly variable genes
[2020-05-12 12:22:27] -----: data scaling
[2020-05-12 12:22:58] -----: PCA
[2020-05-12 12:23:10] -----: clustering
[2020-05-12 12:23:14] -----: tSNE
[2020-05-12 12:23:42] -----: UMAP
[2020-05-12 12:23:59] -----: differential expression analysis
[2020-05-12 12:25:04] -----: Seurat plotting and saving
When using repel, set xnudge and ynudge to 0 for optimal results
[2020-05-12 12:25:28] -----: Doublet score estimation
[2020-05-12 12:26:27] -----: TME cell types annotation
[2020-05-12 12:30:21] -----: cells malignancy annotation
[2020-05-12 12:36:40] -----: cell cycle score estimation
[2020-05-12 12:36:41] -----: stemness score calculation
[2020-05-12 12:36:46] -----: gene set signatures analysis
[2020-05-12 12:37:18] -----: expression programs analysis
[2020-05-12 12:44:43] -----: cell interaction analysis
[2020-05-12 12:45:19] -----: report generating
[2020-05-12 12:46:29] END: Finish scAnnotation

Warning message:
Transformation introduced infinite values in continuous x-axis 

When I changed the dataset to the same folder and using the same scripts, error messages appeared:

[2020-05-12 12:52:19] START: RUN scStatistics
[2020-05-12 12:52:19] -----: data preparation
[2020-05-12 12:53:26] -----: cell calling
[2020-05-12 12:53:36] -----: nUMI & nGene distribution plot
[2020-05-12 12:53:36] -----: mito & ribo & diss distribution plot
[2020-05-12 12:53:38] -----: gene statistics
[2020-05-12 12:53:40] -----: gene proportion plot
[2020-05-12 12:53:52] -----: ambinet genes (SoupX)
[2020-05-12 12:53:52] -----: resutls saving
[2020-05-12 12:53:53] -----: report generating
[2020-05-12 12:54:13] END: Finish scStatistics

Warning messages:
1: In min(outliers) : min里所有的参数都不存在; 回覆Inf
2: In min(outliers) : min里所有的参数都不存在; 回覆Inf
3: In min(outliers) : min里所有的参数都不存在; 回覆Inf

[2020-05-12 12:54:13] START: RUN scAnnotation
[2020-05-12 12:54:13] -----: data preparation
[2020-05-12 12:54:35] -----: Seurat object creation
[2020-05-12 12:54:36] -----: highly variable genes
[2020-05-12 12:54:37] -----: data scaling
[2020-05-12 12:54:56] -----: PCA
[2020-05-12 12:55:04] -----: clustering
[2020-05-12 12:55:05] -----: tSNE
[2020-05-12 12:55:27] -----: UMAP
[2020-05-12 12:55:42] -----: differential expression analysis
[2020-05-12 12:57:21] -----: Seurat plotting and saving
When using repel, set xnudge and ynudge to 0 for optimal results
[2020-05-12 12:57:44] -----: Doublet score estimation
[2020-05-12 12:58:34] -----: TME cell types annotation
[2020-05-12 12:58:38] -----: cells malignancy annotation
Error in cnvList$expr.data[, ref.cellNames] : 下标出界
此外: There were 50 or more warnings (use warnings() to see the first 50)
wguo-research commented 4 years ago

Thanks for your use. According to your error messages, I am not sure where is the bug. I have tested scCancer on some samples from cell ranger v2, and they ran well. So, I think it may due to some differences on details of your data. Could you please send me one of your samples to debug step by step.

wellgoo commented 4 years ago

According to the running time of each step, I strongly suspected that the number of detected cells is very low. Could you please show the report generated by Cell Ranger?

wiceshine commented 4 years ago

Thanks for your use. According to your error messages, I am not sure where is the bug. I have tested scCancer on some samples from cell ranger v2, and they ran well. So, I think it may due to some differences on details of your data. Could you please send me one of your samples to debug step by step.

Thanks a lot for your great help! I sent the data to your Tsinghua email yesterday. However, I made a mistake. My data were processed by cellranger 1.3.1 rather than cellranger 2. Really appreciate your great help!

wiceshine commented 4 years ago

According to the running time of each step, I strongly suspected that the number of detected cells is very low. Could you please show the report generated by Cell Ranger?

Dear Prof. Gu,

Actually, the detected cell number is > 4000 cells. You may access the cellranger report via Dropbox https://www.dropbox.com/s/qcblesdoe3cchnf/report-cellRanger.html?dl=0 Also I sent the data along with the scCancer processed files to your colleague Dr. Wenbo Guo yesterday. You may also approach Dr. Guo and get full access. The short running time for each step may due to my computer configuration (3.2 G 6-core i7, 64 G, MacOS 10.15.4, R 3.6.1). According to my experience, this setup can finish LIGER running w/o suggestK or Seurat in a few minutes.

Thank you very much for your suggestion!

wellgoo commented 4 years ago

Ok. I just found the time cost is much shorter than a "normal" sample :-) Highy recommend that you should use CellRanger 3 to get better cell calling.

wiceshine commented 4 years ago

Ok. I just found the time cost is much shorter than a "normal" sample :-) Highy recommend that you should use CellRanger 3 to get better cell calling.

Wenbo helped me solve my issues. Thanks a lot! And I am going to re-analyze my data by cell ranger 3. And re-run scCancer~

Thanks a lot!

evenDDDDD commented 2 years ago

Hello, I encountered the same problem as you, the error code is: Error in cnvList$expr.data[, ref.cellNames]: subscript out of bounds My matrix is converted by generate10Xdata(). I want to know why this happens. Sincerely look forward to reply!