kordk / torch-ecpg

(GPU accelerated) eCpG mapper
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

mlr output name clarification #29

Closed rituroy closed 1 year ago

rituroy commented 1 year ago

First two columns are misnamed - first column is the gene probe ID, second is the methylation CpG ID Why are the rest of the columns prefixed "mt_"?

tecpg run mlr head output/out.csv mt_site,gt_site,mt_est,mt_err,mt_t,mt_p ILMN_2056551,cg07946583,0.24169591,1.23346,0.19594954,0.39125076 ILMN_2056551,cg21854617,-0.16335967,0.7613679,-0.21456075,0.3897576 ILMN_2056551,cg00048743,-0.7795826,0.2851519,-2.7339206,0.009598621

kordk commented 1 year ago

Please check column names are accurate and provide a dictionary for the column names in the main documentation.

E.g.,

mt_site   methylation loci name
gt_site   gene name
mt_est    methylation coefficient
...
liamgd commented 1 year ago

The mt_ prefix was chosen in accordance to comments such as https://github.com/kordk/torch-ecpg/issues/16#issue-1472034565 and https://github.com/kordk/torch-ecpg/issues/7#issuecomment-1309311476. It represents methylation and in the context of the MLR output, it indicates that those regression results correspond to methylation as opposed to the intercept/constant or covariates.

Would mt_id and gt_id be preferred instead of mt_site and gt_site, or do you have something else in mind?

liamgd commented 1 year ago