Closed bet-gregori closed 5 years ago
(flame) etoxws-v2:~/soft/flame_ws/mols # flame -c predict -e BSEP_upf -f minicaco_0_std.sdf
INFO - Starting prediction with model BSEP_upf version 0 for file minicaco_0_std.sdf
INFO - Running with input type: molecule
Traceback (most recent call last):
File "/opt/anaconda2/envs/flame/bin/flame", line 11, in
I've tried to put a dummy 'activity' field in the query SD file, but in the output.tsv file the prediction (ymatrix column) is the dummy activity value I've entered in the query SD file.
I think there is a problem in:
utils.py: Here, if you don't define SDFile_activity parameter, the code raised an error message and does not continue.
def get_sdf_activity_value(mol, parameters: dict) -> float: """ Checks if activity prop is the same in parameters and SDF input file
Returns activity value as float if possible
"""
if mol.HasProp(parameters['SDFile_activity']):
# get sdf activity field value
activity_str = mol.GetProp(parameters['SDFile_activity'])
try:
# cast val to float to be sure it is num
activity_num = float(activity_str)
except Exception as e:
LOG.error('while casting activity to'
f' float an exception has ocurred: {e}')
activity_num = None
# defence when prop is not in parameter file
else: # SDF doesn't have param prop name
raise ValueError(f"SDFile_activity parameter '{parameters['SDFile_activity']}'"
" not found in input SDF."
"Change SDFile_activity param in parameter.yml"
" to match the target prop in SDF")
return activity_num
in idata.py: Here, if you don't define SDFile_experimental parameter, the code crashes if mol.HasProp(self.parameters['SDFile_experimental']): exp = mol.GetProp(self.parameters['SDFile_experimental']) LOG.debug('Found experimental results in SDF')
well actually the prediction it's a little bit of a mess:
Predict()
Apply
to "apply" the prediction computationApply
uses the model pickle to load the pickle inside a function called run_internal()
estimatorr.project(X, self.results)
run_internal
runs external_validation
(??)self.results
(????) has an ymatrix
(???)Given this comment just below:
# TODO: implement this for every prediction
flame runs this external_validation
every s i n g l e time it has to do a predict?
what is external validation??
if it's testing the model with a non seen dataset (with labels) it should be placed in the learning, not in predict module.
The predict workflow should be clarified, cleaned and documented
Output terminal predict external validation: /home/kpinto/miniconda3/envs/kpi36/lib/python3.6/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release. from numpy.core.umath_tests import inner1d (88, 111) (88, 111) INFO - Prediction finished. flame predict : True
I realized that:
If external validation is performed:
This was never a bug, but a series of missunderstandings about the program behaviour:
When predicting with the command line tool, flame looks for the activity field in the SDfile. However, since it is precisely what I'm trying to predict my query compounds do not have an activity value field.