xuranw / MuSiC

Multi-subject Single Cell Deconvolution
https://github.com/xuranw/MuSiC
GNU General Public License v3.0
231 stars 92 forks source link

How to build custom single cell dataset #2

Open cartal opened 6 years ago

cartal commented 6 years ago

Hi,

I find this method really cool and promising but I am having issues trying to implement it to my data.

Can you provide a vignette (or section of one) describing how to go from expression matrix to the necessary input file for MuSiC? Or perhaps you can construct the single cell reference files from the Tabula muris or MCA datasets?

xuranw commented 6 years ago

Hi,

Thanks for reaching out. The single cell data are stored in ExpressionSet. Please see https://www.rdocumentation.org/packages/Biobase/versions/2.32.0/topics/ExpressionSet for details.

Suppose gene_exprs.matrix is your gene expression matrix (genes by cells), and pheno.matrix is a data frame of phenotype annotation (rownames must match the column names of gene_exprs.matrix). Suppose pheno.matrix have 4 columns: sampleID, SubjectName, cellTypeID, and cellType.

metadata <- data.frame(labelDescription= c("Sample ID", "Subject Name", "Cell Type ID", "Cell Type Name"), row.names=c("sampleID", "SubjectName", "cellTypeID", "cellType"))
SC.eset = ExpressionSet(assayData = data.matrix(gene_exprs.matrix), phenoData =  new("AnnotatedDataFrame", data = pheno.matrix, varMetadata = metadata) )

SC.eset is the single cell data in form of ExpressionSet.

cartal commented 6 years ago

Thank you very much!!

Jiumeizhu commented 5 years ago

Hi xuranw, Your method is really cool and I try to implement it to my data. But I met some problems when I set up the ExpressionSet. I used your method, but for the GSE107585, I could find the phenodata, I download your ExpressionSet, and find detailed information on phenodata. How did you get the information on phenodata? could you tell me the information, I tried different methods but failed.

Thank you very much!

Jiumeizhu commented 5 years ago

I could not find the phenodata.

xuranw commented 5 years ago

Hi Jiumeizhu,

Thanks for using MuSiC.

As you mentioned, you are using the data from GSE107585. The phenodata should includes at least subject name and cell type lable for each cell. Have you checked the annotation file for the dataset? Is there any annotation instruction in their Science paper? If not, maybe you should email the authors that is responsible for this dataset.

When you have the annotation for each cell, it is not hard to construct a cell by pheno-feature matrix to feed the phenodata for ExpressionSet.

Hope this helps.

Best, Xuran

Jiumeizhu commented 5 years ago

Hi Xuran, Thanks for your immediately reply, I down load all the dataset of GSE107585, However, I couldn't find the phenodata also in the annotation file. Because GSE107585 is the kidney scRNA seq data used in your paper, did you get the subject name and cell type for each cell form the author?

Thank you very much! Best wishes, Honglin

发自网易邮箱大师 On 4/5/2019 22:37,xuranwnotifications@github.commailto:notifications@github.com wrote:

Hi Jiumeizhu,

Thanks for using MuSiC.

As you mentioned, you are using the data from GSE107585. The phenodata should includes at least subject name and cell type lable for each cell. Have you checked the annotation file for the dataset? Is there any annotation instruction in their Science paper? If not, maybe you should email the authors that is responsible for this dataset.

When you have the annotation for each cell, it is not hard to construct a cell by pheno-feature matrix to feed the phenodata for ExpressionSet.

Hope this helps.

Best, Xuran

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/xuranw/MuSiC/issues/2#issuecomment-480414295, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AuSKWGqEdzRxqU8tBgn-l8kfNLfAyzw6ks5vd7QRgaJpZM4Vco3W.

ectopicaPKC commented 5 years ago

Do you have a list of steps to follow from start to end for starting with your own single cell data and bulk datasets. I'm trying to figure out how to scale my sc data to my bulk.

Thanks in advance.

isabellagrassucci commented 3 years ago

Hello Xuran! I tried to build my own single-cell Expression Set from Zeisel dataset (GSE60361), downloaded from scRNAseq R library. I did it and I ran Music using as clusters the cell type and as samples the cell ids.

My question is, it is right to use as samples the cell ids, or should I use the subject id, that is another thing?In the tutorial it seems that you use the sample ids, that in a single cell should be the cell id. In this case how Music finds the information about the subject, in order to calculate the cross-subject consistency?

And also, in my case I do not have information about the subjects (only sex, age, etc) but nothing like a subject id or subject name, that's why I asked if it's okay to use the cell ids.

bio-visualisation commented 3 years ago

Thank you very much! Hi, would you explain how did you prepare input files for MuSiC analysis? It is confusing for me. The file explained in the tutorial is not working and I cannot understand what are the input files required for this. @xuranw @cartal

ehoreth commented 2 years ago

@xuranw

Hello, I am relatively new PhD student that is trying to learn how to use MuSiC. The tutorial is very clear in regards to using the previously processed data. However, I am not understanding how to generate the ExpressionSet objects using my own bulk and single cell datasets. Could you please provide a stepwise explanation of how you prepared your ExpressionSet objects so that I (and others who have requested a similar tutorial) can follow along, the response from July 2018 is too vague for me to understand. I appreciate the time and effort you have put into this project, I hope you have time to help me. Thank you.

chloefeng1 commented 1 year ago

Hello xuranw,

I really like your method and want to apply your method with my dataset. In this case, I generated ExpressionSet(exprData) as bulk.mtx and ExpressionSet(scData) as SC.eset. But an error occurred when I run music_prop(), which shows Error in rowMeans(bulk.mtx) : 'x' must be an array and must have at least two dimensions. Could you help me to fix this problem? I hope you have time to help me, thank you.

shuaizh117 commented 1 year ago

Hello, I wonder if anyone has figured this out? I generated the ExpressionSet for my single-cell data based on the advice, however, when I ran music.prop, it gave an error. Do I have to change the ExpressionSet format to SingleCellExperiment format? In the example, EMTAB.sce is a SingleCellExperiment object. Please advise. Thank you!