Limited by default 500. How can we provide/calculate "gene_length" and "reference_gep"?

levinhein commented 1 year ago

Hello. How can we provide/calculate "gene_length" and "reference_gep" for our bulk RNA-seq counts data? It is not clear from the paper and vignette how it was derived/created. Thank you.

umasstr commented 1 year ago

I'm wondering if you ever got a response to this. I am also curious how one would go about annotating CDSeq cell types without the reference_gep format provided.

kkang7 commented 1 year ago

hi all, sorry for late response. I'm not actively maintaining the package recently. gene length can be obtained from some database like https://bioconductor.org/packages/release/bioc/html/biomaRt.html

without reference_gep in the CDSeq function, one can use the celltypeassgin function and provide annotated single cell reference. If no reference at all, one can possibly check the "marker" genes of deconvolved cell type from the output.

On Mon, Apr 24, 2023 at 4:18 PM GitDog @.***> wrote:

I'm wondering if you ever got a response to this. I am also curious how one would go about annotating CDSeq cell types without the reference_gep format provided.

— Reply to this email directly, view it on GitHub https://github.com/kkang7/CDSeq_R_Package/issues/19#issuecomment-1520774655, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEDIUANB5K5S6AF3BOQS7W3XC3N27ANCNFSM6AAAAAAWP57BJY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

umasstr commented 1 year ago

Thanks for your quick reply! I've spent a good part of the day trying to figure out (based on the vignette) what exactly celltypeassign is expecting. In this case, and in the case of reference_gep, it seems very unlikely that someone would know what kind of public data to look for, what structure that data should have and how to read it into CDSeq.

sc_gep = sc_gep, # PBMC single cell data

sc_annotation = sc_annotation,# PBMC single data annotations

kkang7 commented 1 year ago

sorry for the late response and sorry for not making the document clear enough. celltypeassign is typically expected normal single cell count data with annotations. no normalization just raw count single cell with annotations. inside the celltypeassgin function i generated pseudo-single cell count data using CDSeq-estimated GEPs and call Seurat to do clustering by putting CDSeq-estimated cell types and your input single cell together to infer the cell types. so sc_gep should be the count matrix (gene by cell, you can subset your single cell data if it is too large just making sure you have a good amount of cells for each cell type) and sc_annotation is a vector of length equal to ncol(sc_gep) and it contains the annotation of your sc_gep. hope that clarifies a bit.

On Mon, Apr 24, 2023 at 5:01 PM GitDog @.***> wrote:

Thanks for your quick reply! I've spent a good part of the day trying to figure out (based on the vignette) what exactly celltypeassign is expecting. In this case, and in the case of reference_gep, it seems very unlikely that someone would know what kind of public data to look for, what structure that data should have and how to read it into CDSeq.

sc_gep = sc_gep, # PBMC single cell data

sc_annotation = sc_annotation,# PBMC single data annotations

— Reply to this email directly, view it on GitHub https://github.com/kkang7/CDSeq_R_Package/issues/19#issuecomment-1520819500, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEDIUAJKPOCYO4XDJFRCJADXC3SZLANCNFSM6AAAAAAWP57BJY . You are receiving this because you commented.Message ID: @.***>

kkang7 / CDSeq_R_Package

Limited by default 500. How can we provide/calculate "gene_length" and "reference_gep"? #19