My understanding of how model was trained in this project is the following. Let's consider dataset of 100 subjects.
model 1: training on subjects (0:19)
applied model 1 on subjects (20:99)
note: ground truths for (0:19) were not updated
selected best inferences, let's say, subjects (20:49)
manually correct GT for (20:49)
model 2a: fine tuning: model 1 --> model 2a using ONLY (20:49)
model 2b: training using subjects (0:49)
According to @rohanbanerjee, model 2a performs better than model 2b (as we discuss in issue #36)
However, one risk, is model drift towards images with 'bad' quality, given that as the new training rounds increase, the data quality is shifting towards 'bad' cases (ie: the 'good cases' were used for model 1, and might now be forgotten). We need a validation strategy to ensure this does not happen @rohanbanerjee
My suggestion (Julien writing here):
model 2c: fine tuning: model 1 --> model 2c using subjects (0:49)
My understanding of how model was trained in this project is the following. Let's consider dataset of 100 subjects.
According to @rohanbanerjee, model 2a performs better than model 2b (as we discuss in issue #36)
However, one risk, is model drift towards images with 'bad' quality, given that as the new training rounds increase, the data quality is shifting towards 'bad' cases (ie: the 'good cases' were used for model 1, and might now be forgotten). We need a validation strategy to ensure this does not happen @rohanbanerjee
My suggestion (Julien writing here):
Sources: