kkang7 / CDSeq_R_Package

CDSeq R Package
17 stars 10 forks source link

warning messages with CDSeq #8

Open nandakumaryellapu opened 3 years ago

nandakumaryellapu commented 3 years ago

Hi I am working on the deconvolution of the RNASeq data starting with the gene counts file which is in the following format

Gene Sample1 Sample2 Sample3 Sample4 Sample5 g1 0.12 0.23 0.34 0.45 0.56 g2 0.14 0.14 0.14 0.14 0.14 g3 0.16 0.05 -0.06 -0.17 -0.28 g4 0.18 0.65 1.12 1.59 2.06 g5 0.2 0.984 1.768 2.552 3.336

After running CDSeq using this code result <- CDSeq(table, cell_type_number = 3), I got the following warning messages. 1: In CDSeq(table, cell_type_number = 3) : gene_length is NOT provided. CDSeq will estiamte read rate not gene rate. Please provide gene length if you are interested in GEP estimation. 2: In CDSeq(table, cell_type_number = 3) : Reference gene expression profile is missing. Cell type identification and RPKM normalization will NOT be performed by CDSeq. Users can identify CDSeq-identified cell-types using marker genes or reference gene expression profiles. 3: In CDSeq(table, cell_type_number = 3) : bulk_data is NOT read count data. Please provide read counts data if possible for potentially better estimations.

My questions are...

  1. if we are not sure with the 'cell_type_number', how can one give some value here? actually our intention is to predict the cell type and proportions. how can we define this number. here blindly I gave 3.
  2. Do we need to provide the gene length? As I am providing gene counts, I guess gene lengths may not be required to provide. am I correct?
  3. I feel this is a serious point to consider as it is linked with the identification of cell type. I don’t understand what is this reference gene expression profiles. I have just the gene counts file with me.
  4. I am already providing a read count data, but still I am getting this warning.

can anyone explain this? Thank you