Closed stanstrup closed 9 years ago
Dear Michael,
Well, I have not looked into that yet. I don't plan to attempt something very advances. My idea was to try to see if calculating a weighted average RT based on structural similarity to compounds with known RT would give anything sensible. My guess is that it won't work very well unless there is some kind of similarity measure that is able to focus on functional groups and not be dominated by the carbon skeleton.
I don't plan to repeat other people's attempt at building models from different compound descriptors but I think the database surely would be a nice training set for such efforts. My feeling at this point is that from what I have seen the very complicated models that people have made don't really achieve better accuracy than a simple (predicted) logP to RT model (http://link.springer.com/article/10.1007/s00216-013-6954-6) simply because logP also drives the more complicated models and the accuracy of logP prediction has its limits.
I don't know if you noticed but there is an R package in https://github.com/stanstrup/PredRet/tree/master/PredRetR that allows you to pull the database directly into R if that is of use to your efforts. If I can help you in any way let me know.
Best regards, Jan.
Tried it now. Works OK on average but the "tail" of the errors is too much. Could not find a way to detect when it will go wrong.
Median errors in the table:
system | N | Error_abs | Error_rel |
---|---|---|---|
1290SQ | 10 | 11.0924013 | 0.58997267 |
Cao_HILIC | 195 | 1.3528281 | 0.12095178 |
Eawag_XBridgeC18 | 534 | 1.6421808 | 0.23108563 |
FEM_lipids | 73 | 1.7822823 | 0.19845276 |
FEM_long | 595 | 1.9695022 | 0.14438202 |
FEM_orbitrap_plasma | 423 | 1.1097892 | 0.13144312 |
FEM_orbitrap_urine | 234 | 1.64 | 0.13166722 |
FEM_short | 240 | 1.9400734 | 0.14749188 |
IPB_Halle | 195 | 0.6485194 | 0.20959254 |
LIFE_new | 605 | 0.2224328 | 0.11446352 |
LIFE_old | 573 | 0.2167849 | 0.14128269 |
MPI_Symmetry | 41 | 2.2356667 | 0.2402386 |
MTBLS17 | 26 | 0.2067823 | 0.05036747 |
MTBLS19 | 28 | 0.6038375 | 0.10815334 |
MTBLS20 | 447 | 0.8170619 | 0.23202983 |
MTBLS36 | 130 | 0.745657 | 0.34552816 |
MTBLS38 | 287 | 0.6936902 | 0.13919051 |
MTBLS39 | 46 | 1.4660216 | 0.11457672 |
MTBLS4 | 34 | 0.1512292 | 0.02147493 |
MTBLS52 | 29 | 0.9597707 | 0.13122015 |
MTBLS87 | 210 | 2.299983 | 0.19805566 |
RIKEN | 751 | 0.092352 | 0.13155876 |
UFZ_Phenomenex | 660 | 2.4255275 | 0.11509876 |
UniToyama_Atlantis | 166 | 1.7197485 | 0.10799252 |
Dear Jan,
I just found your webpage PredRet and it is very interesting.
What do you exactly plan with structure based prediction?
I´m working on retention time prediction to filter false positive annotations in LC-MS based metabolomics.
Best regards
Michael