stanstrup / PredRet

Shiny app for retention time prediction
GNU General Public License v2.0
9 stars 6 forks source link

add structure based prediction #42

Closed stanstrup closed 9 years ago

michaelwitting commented 9 years ago

Dear Jan,

I just found your webpage PredRet and it is very interesting.

What do you exactly plan with structure based prediction?

I´m working on retention time prediction to filter false positive annotations in LC-MS based metabolomics.

Best regards

Michael

stanstrup commented 9 years ago

Dear Michael,

Well, I have not looked into that yet. I don't plan to attempt something very advances. My idea was to try to see if calculating a weighted average RT based on structural similarity to compounds with known RT would give anything sensible. My guess is that it won't work very well unless there is some kind of similarity measure that is able to focus on functional groups and not be dominated by the carbon skeleton.

I don't plan to repeat other people's attempt at building models from different compound descriptors but I think the database surely would be a nice training set for such efforts. My feeling at this point is that from what I have seen the very complicated models that people have made don't really achieve better accuracy than a simple (predicted) logP to RT model (http://link.springer.com/article/10.1007/s00216-013-6954-6) simply because logP also drives the more complicated models and the accuracy of logP prediction has its limits.

I don't know if you noticed but there is an R package in https://github.com/stanstrup/PredRet/tree/master/PredRetR that allows you to pull the database directly into R if that is of use to your efforts. If I can help you in any way let me know.

Best regards, Jan.

stanstrup commented 9 years ago

Tried it now. Works OK on average but the "tail" of the errors is too much. Could not find a way to detect when it will go wrong.

Median errors in the table:

system N Error_abs Error_rel
1290SQ 10 11.0924013 0.58997267
Cao_HILIC 195 1.3528281 0.12095178
Eawag_XBridgeC18 534 1.6421808 0.23108563
FEM_lipids 73 1.7822823 0.19845276
FEM_long 595 1.9695022 0.14438202
FEM_orbitrap_plasma 423 1.1097892 0.13144312
FEM_orbitrap_urine 234 1.64 0.13166722
FEM_short 240 1.9400734 0.14749188
IPB_Halle 195 0.6485194 0.20959254
LIFE_new 605 0.2224328 0.11446352
LIFE_old 573 0.2167849 0.14128269
MPI_Symmetry 41 2.2356667 0.2402386
MTBLS17 26 0.2067823 0.05036747
MTBLS19 28 0.6038375 0.10815334
MTBLS20 447 0.8170619 0.23202983
MTBLS36 130 0.745657 0.34552816
MTBLS38 287 0.6936902 0.13919051
MTBLS39 46 1.4660216 0.11457672
MTBLS4 34 0.1512292 0.02147493
MTBLS52 29 0.9597707 0.13122015
MTBLS87 210 2.299983 0.19805566
RIKEN 751 0.092352 0.13155876
UFZ_Phenomenex 660 2.4255275 0.11509876
UniToyama_Atlantis 166 1.7197485 0.10799252