Questions about data-processing 0f metabolomic data

renzhezhu677 commented 6 years ago

Hi，I have two questions about data-processing 0f metabolomic data，as follows： 1.Can we compare intensities of different featrues in the same sample？Whether different features intensity are comparable？If not, can we solve this problem by Z-score or other normalization methods for intensity normalization？ 2.Whether the features in positive and negative ion modes can be combined together in data processing（from missing value imputation and normlization to downstream analysis）？If not, can we correct features in two different ion modes to the same level by any method ？ Thanks a lot!

wenbostar commented 6 years ago

Hi @renzhezhu677 ,

1.Can we compare intensities of different featrues in the same sample？Whether different features intensity are comparable？If not, can we solve this problem by Z-score or other normalization methods for intensity normalization？

In general, the intensities of different features from the same sample are not comparable. I don't think Z-score transformation can solve this problem. I don't know whether there is any normalization method that can do this.

2.Whether the features in positive and negative ion modes can be combined together in data processing（from missing value imputation and normlization to downstream analysis）？If not, can we correct features in two different ion modes to the same level by any method ？

I recommend that you perform missing value imputation and normalization for positive and negative feature sets separately. Because the two feature sets are generated separately and they may have very different missing value distribution and intensity distribution. You can combine the results from the two feature sets for pathway analysis and biomarker discovery.

Bo

renzhezhu677 commented 6 years ago

Thanks for your immediate reply！ @wenbostar I have another two questions for you:

Did metaX automatically perform scaling and transformation when drawing PCA and heatmap in QA report?
If did, are the methods of scaling and transformation specified by arguments 't' and 'scale' in function 'metaXpipe()', as I used metaXpipe() for data processing and a few statistical analysis. Here's my code for metaXpipe(): p <- metaXpipe(para,plsdaPara=plsdaPara,missValueRatioQC = 0.5, missValueRatioSample = 0.8,cvFilter=0.3, remveOutlier = TRUE,nor.order=1,doQA = TRUE, doROC = TRUE, qcsc = 1, pclean = FALSE, t = 1, scale = "pareto", nor.method="pqn", outTol=1.2)

wenbostar commented 6 years ago

@renzhezhu677 , you're welcome.

Did metaX automatically perform scaling and transformation when drawing PCA and heatmap in QA report?

In metaX, plotPCA is used for PCA analysis and plotHeatMap is used for heatmap analysis. You can use ?plotPCA and ?plotHeatMap to find the usage of the two functions. For PCA analysis, parameter scale is used for specifying the scaling method. If you want to do log transformation, you can use function preProcess in metaX. For heatmap analysis, the parameter log is used to control whether or not to do log2 transformation. You can also use function preProcess to do that.

If did, are the methods of scaling and transformation specified by arguments 't' and 'scale' in function 'metaXpipe()', as I used metaXpipe() for data processing and a few statistical analysis. Here's my code for metaXpipe():
p <- metaXpipe(para,plsdaPara=plsdaPara,missValueRatioQC = 0.5, missValueRatioSample = 0.8,cvFilter=0.3, remveOutlier = TRUE,nor.order=1,doQA = TRUE, doROC = TRUE, qcsc = 1, pclean = FALSE, t = 1, scale = "pareto", nor.method="pqn", outTol=1.2)
For PCA analysis, the two parameters are used. But for heatmap analysis, as you see in the code metaXpipe:
fig <- plotHeatMap(pp,valueID="valueNorm",log=TRUE,rmQC=FALSE,
scale="row",
clustering_distance_rows="euclidean",
clustering_distance_cols="euclidean",
clustering_method="ward.D2",
show_colnames=FALSE)
In default log2 transformation and scaling (scale="row": row-wise scaling, this is the parameter for function pheatmap) were performed. The parameters "t" and "scale" are not used for heatmap analysis in metaXpipe. If you don't like the default setting, you can firstly use preProcess to process the data and then use plotHeatMap for the processed data.

renzhezhu677 commented 6 years ago

It's so nice of you for your patience！ @wenbostar Here's what I got from your reply:

It seems that 'log' method in all parameters (of any function) refers to log2, instead of log10 or any other base number.
If I'd like to perform heatmap analysis using plotHeatMap, values of the parameter scale are limited to row, column and none, as inherited from pheatmap. What still remains unclear to me is which scaling method(pareto, uv, vector or other methods) does row exactly refer to.

And here's a another question for you: I assume the confidence eclipse in PCA plots were drawn by R package eclipse. If so, what values for arguments of eclipse is defaulted in metaX?

wenbostar commented 6 years ago

It seems that 'log' method in all parameters (of any function) refers to log2, instead of log10 or any other base number.

Yes. For heatmap analysis, I don't think you will find difference when you use log10 or other base number.

If I'd like to perform heatmap analysis using plotHeatMap, values of the parameter scale are limited to row, column and none, as inherited from pheatmap. What still remains unclear to me is which scaling method(pareto, uv, vector or other methods) does row exactly refer to.

It's "auto".

I assume the confidence eclipse in PCA plots were drawn by R package eclipse. If so, what values for arguments of eclipse is defaulted in metaX?

It's not R package eclipse. You can take a look at the code of plotPCA for the details: plotPCA.

renzhezhu677 commented 6 years ago

Thank you again！ @wenbostar On issue of combinating positive and negative ion modes, I agree with you that features in positive and negative ion modes can not be combined together in data processing, and these results can be combined for pathway analysis and biomarker discovery. However, after we scale positive and negative feature sets separately, can we perform hierarchical clustering on the combined set of two ion modes, in order to observe a global profile of all features？The reason why I insist on combining these two feature sets mainly is the combination of positive and negative feature sets seems be more efficient and can avoid troubles to integrate descriptions for two results from downstream analysis.

wenbostar commented 6 years ago

As I said before, you can try to combine the two datasets for PCA and heatmap analysis, after you do missing value imputation and normalization for the two datasets separately.

renzhezhu677 commented 6 years ago

Hi again! Another question for you : How to export results of intermediate processes from metaX ? For my case, which is quite complicated due to a cross-species analysis demand, I need to export the ion intensity after missingvalue filtering and imputation, before normalization. Unfortunately, I couldn't find a function to do this in R document of metaX. Thanks!

wenbostar commented 6 years ago

Please find my answer here #6 . Thanks.

wenbostar / metaX

Questions about data-processing 0f metabolomic data #5