Open idglover opened 1 year ago
Try various ML algorithms to predict next corrected titre from recent corrected titres. Ensure cross-validation is folded by animal (GroupKFold) to prevent data leakage. Have confimed that 'calfeartag' is unique to each animal already.
Limited dataset to only animals with 4 previous corrected titres with an interval of 2-4 months between each test.
This leaves:
11k tests in 5k animals
Offered to models:
Current age, parity, dim, yield, cellcount, protein, butterfat and MTNC, previous four corrected titres, and ratios and absolute differences between last four corrected titres.
LinReg and ENET look okay, but NNET and MARS liable to overfit. (GBM and RF not shown here, but very much overfit).
Taken ENET forward
Tuned ENET ->
Tuning has improved performance slightly, and still not overfit.
Variable importance (permutation):