l-magnificence / Mime

Machine learning-based integration model with elegant performance
Other
79 stars 16 forks source link

The issue of risk score calculation for each model #56

Open Se-Any opened 1 month ago

Se-Any commented 1 month ago

image image image

Upon completing the model execution, I proceeded to calculate risk scores. Surprisingly, I observed significant discrepancies in the order of magnitude of risk scores across different datasets for certain models, particularly the RSF + StepCox[forward] model. Prior to running the code, I had ensured that the expression matrices for all datasets were log-transformed using log(TMP+1). Under normal circumstances, such substantial variations in the order of magnitude of risk scores are unexpected. Furthermore, in my StepCox[forward] model, the coefficients for each gene are in the range of tenths (0.1 to 0.9). Given the risk calculation methodology of the Cox model, it's theoretically implausible to obtain risk scores exceeding 1000. I'm seeking insights to understand and resolve this anomaly.

l-magnificence commented 1 month ago

Thank you for your feedback. The risk score of models are calculated by predict() function, which are not equal to the product of the weight of gene expression and its coefficient. The two methods are log-linear correlation.