phi-grib / flame

Modeling framework for eTRANSAFE project
GNU General Public License v3.0
12 stars 10 forks source link

output.tsv conformal #101

Closed kpinto-gil closed 5 years ago

kpinto-gil commented 5 years ago

When building a model with conformal, the output does not print c0 and c1, and only print ymatrix. By the way, where are the yadjusted values?? I would add another column where it says out of domain if this compound can not be predicted.

kpinto-gil commented 5 years ago

terminal output format:

build:

 nobj  ( number of objects ) :  584.0
 nvarx  ( number of predictor variables ) :  111.0
 model  ( model type ) :  RF qualitative (optimized)
 model  ( model type ) :  conformal RF qualitative
 TP  ( True positives in cross-validation ) :  11.0
 TN  ( True negatives in cross-validation ) :  10.0
 FP  ( False positives in cross-validation ) :  0.0
 FN  ( False negatives in cross-validation ) :  1.0
 Sensitivity  ( Sensitivity in cross-validation ) :  0.9167
 Specificity  ( Specificity in cross-validation ) :  1.0
 MCC  ( Matthews Correlation Coefficient in cross-validation ) :  0.9129

>> Conformal_coverage ( Conformal coverage ) : 0.7097 Conformal_accuracy ( Conformal accuracy ) : 0.9545

predict:

 obj_num  ( number of objects ) :  147.0
 TP  ( True positives in external-validation ) :  8.0
 TN  ( True negatives in external-validation ) :  64.0
 FP  ( False positives in external-validation ) :  17.0
 FN  ( False negatives in external-validation ) :  1.0

>> Coverage ( Conformal coverage in external-validation ) : 0.6122 Sensitivity ( Sensitivity in external-validation ) : 0.8889 Specificity ( Specificity in external-validation ) : 0.7901 MCC ( Mattews Correlation Coefficient in external-validation ) : 0.4548

manuelpastor commented 5 years ago

We removed the angles (>>>) and normalized and ordered the list of quality parameters

Y adjusted are the Y values fitted (adjusted) by the model

We are working for including also adjusted/predicted results for conformal models in the model building results, but the peculiarities of this method makes this more complex. A potential workaround is to use the model to predict the training series. The resulting "external prediction" must be interpreted as fitting results in this case

kpinto-gil commented 5 years ago

As a suggestion: wouldn't it be better to use a pandas dataframe instead of print statements in the code to show the final result in terminal?? I would also appreciate if we could save this dataframe in pickle, tsv, .. format.

manuelpastor commented 5 years ago

Results are already written to a pickl and are fully accessible as JSON. The output can also be dump to a TSV. I think pandas dataframes were not designed to show output, please provide more information if you don't agree

BielStela commented 5 years ago

DataFrame can be printed without problem:

# print_df.py
import pandas as pd

df = pd.DataFrame({'A':[1,2,3,4], 'B':[5,6,7,8]})
print(df)

Gives the following output in the terminal

foo@bar:~$ python print_df.py
   A  B
0  1  5
1  2  6
2  3  7
3  4  8
manuelpastor commented 5 years ago

Yes, they can be printed as any Python type do. What is the advantage? Showing the labels? That's true for matrix data but in this case we have a dictionary!. I keep thinking there is no advantage with respect printing key and value (formated as needed)