idglover / Johnes

0 stars 0 forks source link

Relax definition of regular resting in next corrected titre ML model #4

Open idglover opened 1 year ago

idglover commented 1 year ago

Require only one recent test for predicting next titre

Multiplies rows of data available by 5x (to approx 55k rows in 20k animals)

Distribution of titres remains the same:

Image

NNET model:

Image

Image

Image

This model actually has better RMSE than model including four recent corrected titres.

Results in slightly improved availablity of data for cluster model, but not sufficient.

Increase permitted inter-test interval (ITI) in next-corrected titre ML model

Try ITI 1 to 6 months (in next corrected titre predictions and in clustering models)

This is somewhat successful (notwithstanding a poorer NNET CV-RMSE), elongating segments available for clustering, but problem arises when no residual is available for test as one or more features is an outlier according to titre-correction model, therefore segments are not always available for clustering even if inter-test interval is <= 6 months.

Try modifying data points with outlier feature values, changing them back to min or max for that feature in the titre-correction model data

Requires resetting data to their true values after making predictions, in order that cows can be displayed on cow plots etc. DONE

This results in scarcely more decent-sized clusters. Problem is still that, if an animal has a break of more than six months between tests, then clusters cannot be assigned again until five (1 x predicting next corrected titre + cluster length = 4) tests later.

Try changing permitted ITI to 9 months (for predicting next titre; expect further deterioration in RMSE of this model; and for assembling clusters)

NNET model best.

RMSE has deteriorated to about 11.9.

BUT does allow more cows to be assigned to clusters. Still only a small proportion of cows in each herd clustered at any one timepoint. This is partly due to the lag before clusters are assigned (5 tests required before cluster assignment: 1 for predicting first next corrected titre, then string of four tests for clustering), but also because the string of subsequent tests is broken each time an animal has an intertest interval > 9m.

Try increasing ITI to infinity

NNET TUNED RMSE = 12.3

Try adding in extra features to clustering

Try clustering on subsegments of three tests

idglover commented 1 year ago

Need to increase availability of data to next corrected titre and clustering models.

4 strategies:

1) Reduce number of recent results required for predicting next titre (look at variable importance in models for recent test results) 2) Increase permissible inter-test interval for recent tests in next corrected titre model. BUT WILL NEED TO EXAMINE MODEL PREDICTIONS ACROSS DIFFERENT ITIs IN ORDER TO CHECK MODEL AND INFORM WHETHER ITI IS REQUIRED AS ML FEATURE! 3) Increase permissible inter-test interval in segments in clustering data - could do in conjunction with 2 above. 4) Cluster on smaller subsegments, e.g. only three residuals instead of four.