metrumresearchgroup / bbi

Next generation modeling platform
https://metrumresearchgroup.github.io/bbi/docs
12 stars 2 forks source link

refine output summary #8

Open dpastoor opened 5 years ago

dpastoor commented 5 years ago

Summary

parsing a model results should surface at least the following:

To exercise summary: from: github.com/metrumresearchgroup/babylon/testdata/example-models/nonmem/BQL bbi summary --json 2

Tests

dpastoor commented 5 years ago
 N*LOG(2PI) CONSTANT TO OBJECTIVE FUNCTION:    817.85529455215863     
 OBJECTIVE FUNCTION VALUE WITHOUT CONSTANT:    723.60167739548945     
 OBJECTIVE FUNCTION VALUE WITH CONSTANT:       1541.4569719476481     
 REPORTED OBJECTIVE FUNCTION DOES NOT CONTAIN CONSTANT
dpastoor commented 5 years ago

Existing tree for a simple model looks like:

{
    "RunDetails": {
        "NMversion": "7.4.3",
        "RunStart": "",
        "RunEnd": "",
        "EstimationTime": 0,
        "CovarianceTime": 0,
        "FunctionEvaluations": 176,
        "SignificantDigits": 3.2
    },
    "FinalParameterEstimates": {
        "Theta": [
            1.82,
            12.9,
            0.383
        ],
        "Omega": [
            0.0595,
            0,
            0.0266,
            0,
            0,
            0.302
        ],
        "Sigma": [
            0.0365,
            0,
            0.116
        ],
        "OmegaCorr": [
            0.244,
            0,
            0.163,
            0,
            0,
            0.55
        ],
        "SigmaCorr": [
            0.191,
            0,
            0.341
        ]
    },
    "FinalParameterStdErr": {
        "Theta": [
            0.0692,
            0.652,
            0.0361
        ],
        "Omega": [
            0.0138,
            0,
            0.0212,
            0,
            0,
            0.0724
        ],
        "Sigma": [
            0.00419,
            0,
            0.0337
        ],
        "OmegaCorr": [
            0.0284,
            0,
            0.0651,
            0,
            0,
            0.0659
        ],
        "SigmaCorr": [
            0.011,
            0,
            0.0494
        ]
    },
    "ParameterStructures": {
        "Theta": 3,
        "Omega": [
            1,
            0,
            1,
            0,
            0,
            1
        ],
        "Sigma": [
            1,
            0,
            1
        ]
    },
    "ParameterNames": {
        "Theta": [
            "CL 1.8",
            "V  12.7",
            "KA"
        ],
        "Omega": [],
        "Sigma": []
    },
    "OFV": 723.602
}

as you can see there are definitely some "bugs" or inconsistencies at least in name parsing

david-lyder commented 5 years ago
Data Token used to find value in lst file Proposed Data structure Category
Estimation method MAP (ETAHAT) ESTIMATION METHOD (OPTMAP):   0 ETA HESSIAN EVALUATION METHOD (ETADER):    0 Struct{ String String } Add to RunDetails
Whether/which Multiple methods Same as above? []string Add to RunDetails
Output tables From tab file, or data at: 0ITERATION NO.:    0 Struct or array ?????
SEE ? ?  
shrinkage ETASHRINK ETASHRINKSD(%)  9.0297E+00  2.0180E+01  1.7126E+01 Struct{ Name Value Value Value } completion
OFV N*LOG(2PI) CONSTANT TO OBJECTIVE FUNCTION:    817.85529455215863      OBJECTIVE FUNCTION VALUE WITHOUT CONSTANT:    821.70599097758145      OBJECTIVE FUNCTION VALUE WITH CONSTANT:       1639.5612855297400 Struct{ Name Value Value Value } completion
Dataset used $DATA string Add to RunDetails
Number of patients TOT. NO. OF INDIVIDUALS: int Add to RunDetails
Number of observations TOT. NO. OF OBS RECS: int Add to RunDetails
Completion messages or problems WARNINGS AND ERRORS (IF ANY) FOR PROBLEM string Completion
COV step completed ? 0COVARIANCE STEP OMITTED: Elapsed covariance  time in seconds Bool string completion
0 gradient GRADIENT: Starting: 0ITERATION NO.:    0 Ending: 0ITERATION NO.:    N Struct{ Start End}   Array[8] strings/decimal completion
Other signaling ...etc. Problem text Path to the mod file NO. OF DATA RECS IN DATA SET string Add to RunDetails
       
Fixes: "RunStart": "", "RunEnd": "", "EstimationTime": 0, "CovarianceTime": 0, Elapsed finaloutput time in seconds Stop Time: Elapsed covariance  time in seconds:     0.79 Elapsed postprocess time in seconds:     0.02 RunStart string     RunEnd string     EstimationTime float64     CovarianceTime float64 RunDetails
       
       
       
david-lyder commented 5 years ago

Data fields from above were added to the existing RunDetails structure, or the new CompletionDetails structure. All that is left is "tab" data - need clarification on that data.

// RunDetails contains key information about logistics of the model run type RunDetails struct { NMversion string RunStart string RunEnd string EstimationTime float64 CovarianceTime float64 FunctionEvaluations int64 SignificantDigits float64 // new ProblemText string ModFile string EstimationMethod []string DataSet string NumberOfPatients int32 NumberOfObs int32 NumberOfDataRecords int32 }

// CompletionDetails ... type CompletionDetails struct { Shrinkage []ShrinkageDetails Ofv OfvDetails Gradient GradientDetails CovStepComplete bool Message string }

// OfvDetails ... type OfvDetails struct { NlogObvValule string NoConstantValue string ConstantValue string }

// ShrinkageDetails ... type ShrinkageDetails struct { Name string Value1 float64 Value2 float64 Value3 float64 } type GradientDetails struct { Zero bool Start string End float64 }

dpastoor commented 5 years ago

Shrinkage and Gradient details should probably be in a broader parameter struct or would follow the same pattern that we have right now with respect to being arrays.

dpastoor commented 5 years ago

example output from PSN sumo we should have all metadata associated to recreate such an output:

-----------------------------------------------------------------------
Successful minimization [ OK ]
No rounding errors [ OK ]
No zero gradients [ OK ]
No final zero gradients [ OK ]
Hessian not reset [ OK ]
Parameter(s) near boundary [ WARNING ]
OMEGA(2,2) 0.000112 0
Covariance step [ OK ]
Large condition number [ WARNING ]
Correlations [ OK ]
0MINIMIZATION SUCCESSFUL
NO. OF FUNCTION EVALUATIONS USED: 172
NO. OF SIG. DIGITS IN FINAL EST.: 3.4
ETABAR IS THE ARITHMETIC MEAN OF THE ETA-ESTIMATES,
AND THE P-VALUE IS GIVEN FOR THE NULL HYPOTHESIS THAT THE TRUE MEAN IS 0.
ETABAR: -0.41E-01 0.59E-03 0.30E-01
SE: 0.35E+00 0.25E-02 0.18E+00
P VAL.: 0.91E+00 0.82E+00 0.87E+00
Objective function value: 116.966
Condition number: 1.215e+004
THETA OMEGA SIGMA
THETA1 1.87 (-0.2503) OMEGA(1,1) 1.277 (-1.086) SIGMA(1,1) 0.478 (-0.2762)
THETA2 0.0862 (-0.05139) OMEGA(2,2) 0.01058 (-1.018)
THETA3 0.0398 (-0.08266) OMEGA(3,2) 0.9187 (-0.848)
OMEGA(3,3) 0.6768 (-0.548)
-----------------------------------------------------------------------
dpastoor commented 5 years ago

CovarianceStep struct { Attempted bool MatrixType string OK bool Errors []string }

dpastoor commented 5 years ago

// ShrinkageDetails ... type ShrinkageDetails struct { Name string Value1 float64 Value2 float64 Value3 float64 }

instead I think this needs to just be an array of values similar to how parameter estimates are an array.

EtaShrinkage { SD: []float64 VR: []float64 } EbvShrinkage { SD: []float64 VR: []float64 } EpsShrinkage { SD: []float64 VR: []float64 }

dpastoor commented 5 years ago
type OfvDetails struct {
OFV float64
OFVNoConstant float64
OFVWithConstant float64
}

we need to follow up with mike if this matters REPORTED OBJECTIVE FUNCTION DOES NOT CONTAIN CONSTANT, and if so we can add another bool value to the details struct

dpastoor commented 5 years ago
// CompletionDetails ...
type CompletionDetails struct {
Shrinkage []ShrinkageDetails
Ofv OfvDetails
+ZeroGradientDetected bool
+ FinalZeroGradientDetected bool
-Gradient GradientDetails // this should be its own struct
CovStepComplete bool
-Message    string
+Messages []string
}

Gradient details can move out to a separate struct because we should capture more information around the gradient values.

For the gradients really only care about what parameter had a zero gradient somewhere. They can always go get the actual gradient values from the EXT file if they really care

david-lyder commented 5 years ago

Ad discussed, we can use multiple branches to implement this work. This table lists the "easy to implement" changes for the next feature/branch:

Value location of data in lst file
Name of single Estimation method $ESTIMATION MAXEVAL=9999 PRINT=5 METH=1 INT MSF=./1.msf
Name of output table file $TABLE NOPRINT ONEHEADER FILE=./1.tab
The dataset used ($DATA) $DATA ../../derived/mock1.csv IGNORE=C
Number of patients TOT. NO. OF INDIVIDUALS:
Number of observations TOT. NO. OF OBS RECS:
Completion message WARNINGS AND ERRORS (IF ANY) FOR PROBLEM
Runs start value First line of file: Fri Jul 12 10:47:39 EDT 2019
Run end value Stop Time:
Estimation Time value Elapsed finaloutput time in seconds: Or #CPUT: Total CPU Time in Seconds Or Elapsed estimation  time in seconds:
Covariance Time value Elapsed covariance  time in seconds:
david-lyder commented 4 years ago

After changes:

{ "run_details": { "version": "7.4.3", "run_start": "Wed Sep 11 10:19:21 EDT 2019", "run_end": "Wed Sep 11 10:19:35 EDT 2019", "estimation_time": 4.04, "covariance_time": 0.06, "function_evaluations": 189, "significant_digits": 3.4, "problem_text": "LEM RUN# 2 - 2cmpt model - no BQLs", "estimation_method": [ "First Order Conditional Estimation with Interaction" ], "data_set": "../nobqldata.csv", "number_of_patients": 193, "number_of_obs": 2702, "number_of_data_records": 2895, "output_files_used": [ "/Users/davidl/go/src/github.com/metrumresearchgroup/babylon/testdata/example-models/nonmem/BQL/2.lst", "/Users/davidl/go/src/github.com/metrumresearchgroup/babylon/testdata/example-models/nonmem/BQL/2.ext", "/Users/davidl/go/src/github.com/metrumresearchgroup/babylon/testdata/example-models/nonmem/BQL/2.grd" ] }, "run_heuristics": { "large_condition_number": "HeuristicFalse", "correlations_ok": "HeuristicTrue", "has_final_zero_gradient": "HeuristicFalse", "minimization_successful": "HeuristicTrue" }, "parameters_data": [ { "method": "TABLE NO. 1: First Order Conditional Estimation with Interaction: Goal Function=MINIMUM VALUE OF OBJECTIVE FUNCTION: Problem=1 Subproblem=0 Superproblem1=0 Iteration1=0 Superproblem2=0 Iteration2=0", "estimates": { "theta": [ 26.4905, 282.616, 297.043, 58.749, 1.5095, 0.75, 1, 1 ], "omega": [ 0.100611, 0, 0.035999, 0, 0, 0.0111726 ] }, "std_err": { "theta": [ 0.60908, 4.40586, 2.32933, 1.01296, 0.0212776, 10000000000, 10000000000, 10000000000 ], "omega": [ 0.00959481, 10000000000, 0.00353287, 10000000000, 10000000000, 0.00186447 ] }, "random_effect_sd": { "omega": [ 0.317192, 0, 0.189734, 0, 0, 0.1057 ] }, "random_effect_sdse": { "omega": [ 0.0151246, 10000000000, 0.00931006, 10000000000, 10000000000, 0.00881961 ] }, "fixed": { "theta": [ 0, 0, 0, 0, 0, 1, 1, 1 ], "omega": [ 0, 1, 0, 1, 1, 0 ] } } ], "parameter_structures": { "Theta": 9, "Omega": [ 1, 0, 1, 0, 0, 1 ], "Sigma": [ 1 ] }, "parameter_names": { "theta": [ "1 CLF", "2 V2F", "3 V3F", "4 QF", "5 KA", "6 POW_CL", "7 POW_V2", "8 POW_V3", "9 POW_Q" ] }, "ofv": { "ofv": 4965.943833438051, "ofv_no_constant": -14346.006, "ofv_with_constant": -9380.062196247563 }, "shrinkage_details": { "eta": { "sd": [ 0.40632, 2.0606, 18.484 ], "vr": [ 0.81099, 4.0787, 33.551 ] }, "ebv": { "sd": [ 0.49256, 2.1487, 18.703 ], "vr": [ 0.98269, 4.2512, 33.908 ] }, "eps": { "sd": [ 9.7026 ], "vr": [ 18.464 ] } }, "covariance_theta": [ { "Values": [ 0.370979, -0.214341, -0.150899, -0.0241113, 0.00104312, 0, 0, 0, 0, -0.214341, 19.4116, -3.53392, -1.9927, 0.0364529, 0, 0, 0, 0, -0.150899, -3.53392, 5.42576, 1.44045, -0.0283908, 0, 0, 0, 0, -0.0241113, -1.9927, 1.44045, 1.02609, -0.0157357, 0, 0, 0, 0, 0.00104312, 0.0364529, -0.0283908, -0.0157357, 0.000452737, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ], "Dim": 9 } ], "correlation_theta": [ { "Values": [ 0.60908, -0.079873, -0.10636, -0.0390799, 0.0804893, 0, 0, 0, 0, -0.079873, 4.40586, -0.344346, -0.446497, 0.388847, 0, 0, 0, 0, -0.10636, -0.344346, 2.32933, 0.610485, -0.572828, 0, 0, 0, 0, -0.0390799, -0.446497, 0.610485, 1.01296, -0.730081, 0, 0, 0, 0, 0.0804893, 0.388847, -0.572828, -0.730081, 0.0212776, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ], "Dim": 9 } ] }

david-lyder commented 4 years ago

The above output summary is considered complete for version 2.0.0 Further enhancements will be addressed based on feedback from the scientists. Items listed above that are explicitly not addressed in 2.0.0 are: [] Output tables [] SEE [] Completion messages or problems [] Other signaling ...etc