zhengrongbin / MEBOCOST

A python-based package and software to predict metabolite mediated cell-cell communications by single-cell RNA-seq data
BSD 3-Clause "New" or "Revised" License
59 stars 10 forks source link

Interpretation of Single-Cell Metabolite Estimation Results Using MEBOCOST #18

Closed looppy95 closed 5 months ago

looppy95 commented 7 months ago

Hello! I am currently using MEBOCOST for the estimation of metabolite abundance in single cells. I have successfully executed the following code snippet: only estimate metabolite abundance for cells using expression data two steps include loading config and run estimator mebo_obj._loadconfig() mebo_obj.estimator() check the metabolite estimation result met_mat = pd.DataFrame(mebo_obj.met_mat.toarray(), index=mebo_obj.met_mat_indexer, columns=mebo_obj.met_mat_columns) met_mat.head()

I have obtained a DataFrame where rows represent metabolites and columns represent individual cells. After summing each row, I have added a new column containing both positive and negative values. Now, I am seeking guidance on how to interpret the presence of metabolites based on these results. 20231227211839

Could you provide insights or recommendations on how to assess the significance or presence of metabolites in the context of the calculated sums, considering the positive and negative values obtained? Your guidance on this matter would be greatly appreciated. Thank you!

zhengrongbin commented 6 months ago

Hi, the data frame contains values of aggerated enzyme gene expression of metabolites directly based on the scRNA-seq gene expression data. As you mentioned, that is a metabolite-by-cell matrix. Why did you want to take a sum by each row (metabolite)?

looppy95 commented 6 months ago

Thank you for your response. I am interested in using MEBOCOST to assess the presence of metabolites in tissue. In your article, you describe that "the relative presence of a given metabolite in a cell can be estimated by the average expression of enzymes in reactions that take the metabolite as a product after subtracting the average expression of enzymes in reactions that take the metabolite as a substrate." May I assume that when the values in the dataframe are positive, there is a higher likelihood of the presence of metabolites? Additionally, if it is not suitable to assess the presence of metabolites by summation, can I conduct a differential analysis on the numerical values of the dataframe for each cell under different conditions (e.g., disease states, where I ran MEBOCOST for each cell with conditions specified as adata.obs['label'] = adata.obs['condition']) to identify metabolites that may exhibit differences in "metabolite presence"? Thank you for your guidance.

zhengrongbin commented 6 months ago

Hi, "May I assume that when the values in the dataframe are positive, there is a higher likelihood of the presence of metabolites?" -- yes, you can understand in that way.

If I understand correctly, you have tissue-level single-cell RNA-seq data, but you'd like to calculate the overall metabolite presence in that tissue. I didn't try such an analysis and cannot tell you precisely if it will work by summing up all cells. You should explore whether taking sum, median, or mean will work by a criterion in your biological context.

For differential analysis, yes, I think you can do that.

looppy95 commented 6 months ago

Thank you for your response. Based on your advice, I will use MEBOCOST to explore the values in the data frames for each cell under different conditions, aiming to identify metabolites that may exhibit differences in terms of "metabolite presence." I would greatly appreciate any additional suggestions or considerations you may have.