add structure based prediction

michaelwitting commented 9 years ago

Dear Jan,

I just found your webpage PredRet and it is very interesting.

What do you exactly plan with structure based prediction?

I´m working on retention time prediction to filter false positive annotations in LC-MS based metabolomics.

Best regards

Michael

stanstrup commented 9 years ago

Dear Michael,

Well, I have not looked into that yet. I don't plan to attempt something very advances. My idea was to try to see if calculating a weighted average RT based on structural similarity to compounds with known RT would give anything sensible. My guess is that it won't work very well unless there is some kind of similarity measure that is able to focus on functional groups and not be dominated by the carbon skeleton.

I don't plan to repeat other people's attempt at building models from different compound descriptors but I think the database surely would be a nice training set for such efforts. My feeling at this point is that from what I have seen the very complicated models that people have made don't really achieve better accuracy than a simple (predicted) logP to RT model (http://link.springer.com/article/10.1007/s00216-013-6954-6) simply because logP also drives the more complicated models and the accuracy of logP prediction has its limits.

I don't know if you noticed but there is an R package in https://github.com/stanstrup/PredRet/tree/master/PredRetR that allows you to pull the database directly into R if that is of use to your efforts. If I can help you in any way let me know.

Best regards, Jan.

stanstrup commented 9 years ago

Tried it now. Works OK on average but the "tail" of the errors is too much. Could not find a way to detect when it will go wrong.

Median errors in the table:

system	N	Error_abs	Error_rel
1290SQ	10	11.0924013	0.58997267
Cao_HILIC	195	1.3528281	0.12095178
Eawag_XBridgeC18	534	1.6421808	0.23108563
FEM_lipids	73	1.7822823	0.19845276
FEM_long	595	1.9695022	0.14438202
FEM_orbitrap_plasma	423	1.1097892	0.13144312
FEM_orbitrap_urine	234	1.64	0.13166722
FEM_short	240	1.9400734	0.14749188
IPB_Halle	195	0.6485194	0.20959254
LIFE_new	605	0.2224328	0.11446352
LIFE_old	573	0.2167849	0.14128269
MPI_Symmetry	41	2.2356667	0.2402386
MTBLS17	26	0.2067823	0.05036747
MTBLS19	28	0.6038375	0.10815334
MTBLS20	447	0.8170619	0.23202983
MTBLS36	130	0.745657	0.34552816
MTBLS38	287	0.6936902	0.13919051
MTBLS39	46	1.4660216	0.11457672
MTBLS4	34	0.1512292	0.02147493
MTBLS52	29	0.9597707	0.13122015
MTBLS87	210	2.299983	0.19805566
RIKEN	751	0.092352	0.13155876
UFZ_Phenomenex	660	2.4255275	0.11509876
UniToyama_Atlantis	166	1.7197485	0.10799252

stanstrup / PredRet

add structure based prediction #42