idglover / Johnes

0 stars 0 forks source link

Train ML Models for predicting next corrected titre #1

Open idglover opened 1 year ago

idglover commented 1 year ago

Limited dataset to only animals with 4 previous corrected titres with an interval of 2-4 months between each test.

This leaves:

11k tests in 5k animals

Offered to models:

Current age, parity, dim, yield, cellcount, protein, butterfat and MTNC, previous four corrected titres, and ratios and absolute differences between last four corrected titres.

Image

LinReg and ENET look okay, but NNET and MARS liable to overfit. (GBM and RF not shown here, but very much overfit).

Taken ENET forward

Tuned ENET ->

Image

Tuning has improved performance slightly, and still not overfit.

Variable importance (permutation):

Image

Image

Image

Image

idglover commented 1 year ago

Try various ML algorithms to predict next corrected titre from recent corrected titres. Ensure cross-validation is folded by animal (GroupKFold) to prevent data leakage. Have confimed that 'calfeartag' is unique to each animal already.