idglover / Johnes

0 stars 0 forks source link

Cluster Johne's Data #2

Open idglover opened 1 year ago

idglover commented 1 year ago

Initial attempt:

ML model predicts a corrected titre based on most recent 4 corrected titres, and current age, yield, cellcount etc.

Residuals can then be clustered (MCLust model-based clustering in R).

Initial attempt to cluster subsegments of four timepoints. Features used in clustering include residuals themselves, accumulated residuals (to give information about direction of travel/trajectory of a cow), and standrad deviation of residuals.

Some success with three- and four-group models using this strategy:

See here for some interesting results:

Y:\Ian\PhD\SupervisorMeetings\23_04_27\SelectedClusteringResults.docx

idglover commented 1 year ago

Possibly strategies:

1) Cluster residuals from ML model 2) Cluster some kind of cumulative residual 3) Cluster corrected titres themselves

a) Cluster entire sequences, only sufficiently long sequences (large age range rather than large number of tests!) with e.g. Dynamic Time Warping. Could use these clusters to inform: b) Cluster short subsequences (e.g. 3 or 5 consecutive residuals/corrected titres) - This will be most useful as will allow online clustering of cows.