kkang7 / CDSeq_R_Package

CDSeq R Package
17 stars 10 forks source link

question about scRNA-seq data and thetheory of CDSeq #3

Closed caiquanyou closed 3 years ago

caiquanyou commented 4 years ago

hello ,I read your paper and have some problems : 1.I have scRNA-seq data and how clould i use it to match the algorithm result? 2.I can not understand the real difference from LDA to CDseq? what is CDSeq's gene length means?

kkang7 commented 4 years ago

Hello, great questions.

  1. I wrote a new function named cellTypeAssignSCRNA, you can use this function to annotate CDSeq-estimated cell types. The function assumes your scRNAseq is already annotated. And make sure the genes you used in bulk samples for deconvolution matches the genes in your scRNAseq. you can take an intersection of genes in bulk and scRNASeq before you run deconvolution.
  2. gene length denotes the effective length of a gene. it equals to the length of a gene minus the length of the reads plus 1. It basically tells the number of positions that a read can locate on that gene. CDSeq builds upon LDA, and we tried to adjust LDA model to fit in the RNAseq context in the following aspects: first, consider that a read generated from a gene is affected by the gene length; second, in many cases, the cell proportion estimation is actually RNA proportions, we try to adjust that to cell proportions; third, we came up with a way to estimate the number of cell types in the bulk.

On Thu, Sep 10, 2020 at 2:15 AM caiquanyou notifications@github.com wrote:

hello ,I read your paper and have some problems : 1.I have scRNA-seq data and how clould i use it to match the algorithm result? 2.I can not understand the real difference from LDA to CDseq? what is CDSeq's gene length means?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kkang7/CDSeq_R_Package/issues/3, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEDIUAISHUOJ32D5VYUNKQ3SFBVGXANCNFSM4REKD6XQ .

caiquanyou commented 4 years ago

thanks for answering my question,I try to understand now,could you please tell the pipeline or the psuedocode because I'm not good at matlab?

kkang7 commented 4 years ago

You're welcome. I think you are in the R package repository instead of the MATLAB one. If you are familiar with R, you can install the package by running install_github("kkang7/CDSeq_R_Package"). And ?CDSeq will show minimal info and a quick example. Let me know if you have any other questions.

On Thu, Sep 10, 2020 at 9:35 PM caiquanyou notifications@github.com wrote:

thanks for answering my question,I try to understand now,could you please tell the pipeline or the psuedocode because I'm not good at matlab?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kkang7/CDSeq_R_Package/issues/3#issuecomment-690820812, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEDIUAKTCT5ZQZO5E5EBCRLSFF5HDANCNFSM4REKD6XQ .

caiquanyou commented 4 years ago

Hello,kkang one more queetion about CDSeq:why you choose LDA ? or why do you think Dirichlet distribution is suitable for RNA-seq data?

kkang7 commented 4 years ago

Good question. We actually addressed a bit about why LDA is our choice. Essentially, its components consist of Dirichlet-Multinomial which is a widely used model for count data. RNAseq data is a type of count data before normalization, even though it's very noisy if you think about the experimental sequencing procedure but Dirichlet-multinomial is a good starting point in my opinion.

Best, Kai

On Sat, Sep 12, 2020 at 3:09 AM caiquanyou notifications@github.com wrote:

Hello,kkang one more queetion about CDSeq:why you choose LDA ? or why do you think Dirichlet distribution is suitable for RNA-seq data?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kkang7/CDSeq_R_Package/issues/3#issuecomment-691431277, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEDIUAI24T46QZIZJVTKAXLSFMNEBANCNFSM4REKD6XQ .

caiquanyou commented 4 years ago

Hello, I got an error when installing CDSeq into another PC,like below: installing to /home/dean/R/x86_64-pc-linux-gnu-library/3.6/00LOCK-CDSeq/00new/CDSeq/libs R data * moving datasets to lazyload DB byte-compile and prepare package for lazy loading Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : namespace 'Rcpp' 1.0.1 is being loaded, but >= 1.0.3 is required Calls: ... withCallingHandlers -> loadNamespace -> namespaceImport -> loadNamespace Execution halted ERROR: lazy loading failed for package 'CDSeq'

kkang7 commented 4 years ago

It seems you need to install a later version of Rcpp on the machine. Could you try that and see if you still have the error?

Best, Kai

On Sun, Sep 20, 2020 at 10:52 PM caiquanyou notifications@github.com wrote:

Hello, I got an error when installing CDSeq into another PC,like below: installing to /home/dean/R/x86_64-pc-linux-gnu-library/3.6/00LOCK-CDSeq/00new/CDSeq/libs R data * moving datasets to lazyload DB byte-compile and prepare package for lazy loading Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : namespace 'Rcpp' 1.0.1 is being loaded, but >= 1.0.3 is required Calls: ... withCallingHandlers -> loadNamespace -> namespaceImport -> loadNamespace Execution halted ERROR: lazy loading failed for package 'CDSeq'

  • removing '/home/dean/R/x86_64-pc-linux-gnu-library/3.6/CDSeq' Error: Failed to install 'CDSeq' from GitHub: (converted from warning) installation of package '/tmp/RtmpRikdda/file4e372218087/CDSeq_1.0.7.tar.gz' had non-zero exit status How could I fix this problem? Best regards CQY

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kkang7/CDSeq_R_Package/issues/3#issuecomment-695879115, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEDIUAMOOTTXDNVQFB3RJVLSG25YRANCNFSM4REKD6XQ .

caiquanyou commented 4 years ago

fix it now! but get this 👍* DONE (CDSeq)

library(CDSeq) Warning message: In normalizePath(path.expand(path), winslash, mustWork) : path[1]="C:\Users\gerra\Anaconda3\envs\cvxpy/python.exe": system could not find the specified file. does it matter?

caiquanyou commented 4 years ago

and run command below error: Error in CDSeq(bulk_data = x, beta = 0.5, alpha = 5, cell_type_number = 6, : length(gene_length) should be equal to nrow(bulk_data)

how to set gene_length??

kkang7 commented 4 years ago

The warning msg seems saying you need to get the python installed correctly. maybe try reinstalling python. gene length can be found from the genome reference such as GENCODE that RNAseq data used for mapping the reads, but you could leave it as null if you just need to test the installation.

On Mon, Sep 21, 2020 at 3:31 AM caiquanyou notifications@github.com wrote:

and run command below error: Error in CDSeq(bulk_data = x, beta = 0.5, alpha = 5, cell_type_number = 6, : length(gene_length) should be equal to nrow(bulk_data)

how to set gene_length??

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kkang7/CDSeq_R_Package/issues/3#issuecomment-695950698, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEDIUANXRYAEYVIUK5YXGYTSG36LJANCNFSM4REKD6XQ .