Closed MiguelCos closed 3 years ago
Hi Miguel,
For bMIND2, there is no need to have a 3D array, which I had from example data, and converted it to a matrix for convenience. bMIND2 only needs a matrix of bulk data (gene x sample) and fractions (sample x cell type). Please let me know if you have further questions on the input format.
Are you focusing on protein expression data, even for single-cell proteomics? It seems that there is RNA gene expression on the protein atlas website as well.
Hello randel,
Many thanks for your answer.
I will try with setting the a simple matrices for each tissue region separately then.
Are you focusing on protein expression data, even for single-cell proteomics? It seems that there is RNA gene expression on the protein atlas website as well.
I am not sure I understand your question. We only have two matrices of normalized protein expression data, arousing from mass spectrometry of the whole tissues (I would call this bulk data).
We don't have any kind of own single-cell proteomics data, but we are using the compiled single-cell expression data from the human protein atlas (which I think is RNA-based) to generate our signature matrix and to use est_frac
to estimate the fraction of cells per sample based on our protein expression data.
I understand the correlation between RNA levels and protein levels is about ~0.6 but I couldn't find any summarized resource of single-cell proteomics data.
Would have any argument against this kind of approach for the use of the MIND package?
Yes, you can analyze each tissue region separately. If you'd like to do estimation with multiple tissue regions together, you can set the sample_id
option with subject ID, e.g., 1, 1, 2, denoting two samples from subject 1 and one sample from subject 2. bMIND will do the estimation and testing at the subject level. But this seems not to apply to your data since you have two matrices from mouse and human?
For your second question, do you mean that you can use scRNA-seq data as a reference/signature matrix to deconvolve protein expression data? I have not seen people doing it this way, but it may work if RNA-seq data is proportional to protein expression. Please let me know if the estimated cell-type fractions make sense to you.
Hello,
I find this package awesome I would like to consult your opinion on its potential application to our specific problem/question and also report on some difficulties that I am getting on setting up the input data.
I am testing this approach to identify protein expression signatures from cancer xenografts using mass-spec proteomics. We want to identify signature proteins from tumors or stroma and potentially identify expression patterns within tumors or stroma.
We have bulk mass-spec data from the whole tumor+stromal region, but we can distinguish both by identifying proteins that are either specific to human (tumor) or mouse (stroma). Therefore we have two expression matrices that can each be associated with a specific tissue region.
We don't have single-cell data, but I managed to generate a signature matrix from mining the Human Protein Atlast for single-cell-specific expression patterns of cells that I consider to be potentially found in stroma or tumor tissue.
With this, I can generate cell fraction matrices using
est_frac
for both human(tumor) and mouse(stroma).Then I am generating two 3D arrays:
bulk
input andfrac
input.Bulk input:
Frac input:
I am having an error when executing
bMIND2
, which I understand has relation with the way I set up my arrays:I am comparing my arrays with the example arrays and it seems evident that my arrays need some tunning, but I am having problems now setting them up.
Two questions:
Many thanks for taking the time to read!
Best wishes, Miguel