waldo-seg / waldo

image-segmentation and text-localization
Apache License 2.0
13 stars 13 forks source link

train/valid probs #63

Open danpovey opened 6 years ago

danpovey commented 6 years ago

Guys, We should have a mechanism to compute either accuracies on a subset of training data, or objective function values on validation data. (Are we doing this already?) This will show whether our model is underfitting or overfitting, which right now I have no idea about.

YiwenShaoStephen commented 6 years ago

train.py will give you the BCE loss on train and val data. If you want to further see the segmentation results on train and val, you need to run segment.py on them on using the scoring.py to get the MAP (mean average precision)

hhadian commented 6 years ago

But, the segmentation algorithm also produces a final logprob for each image. I guess it would be helpful to write that to disk too (and maybe also its average on a whole test set).

On Thu, May 31, 2018 at 1:13 AM, Yiwen Shao notifications@github.com wrote:

train.py will give you the BCE loss on train and val data. If you want to further see the segmentation results on train and val, you need to run segment.py on them on using the scoring.py to get the MAP (mean average precision)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/waldo-seg/waldo/issues/63#issuecomment-393312602, or mute the thread https://github.com/notifications/unsubscribe-auth/AOW_Dbdi-Y_cr-Vm47LL9WtkK5MfHxjSks5t3wR-gaJpZM4UT-V4 .

danpovey commented 6 years ago

OK. How different are the BCE losses for train and val, in the DBS2018 setup?

On Wed, May 30, 2018 at 4:43 PM, Yiwen Shao notifications@github.com wrote:

train.py will give you the BCE loss on train and val data. If you want to further see the segmentation results on train and val, you need to run segment.py on them on using the scoring.py to get the MAP (mean average precision)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/waldo-seg/waldo/issues/63#issuecomment-393312602, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu5XIU0VkVwOg0GXryVC-QhfIKi59ks5t3wR9gaJpZM4UT-V4 .

YiwenShaoStephen commented 6 years ago

Almost the same. So there is no overfitting yet. I think heavier image augmentation will help since the dataset is relatively small.

danpovey commented 6 years ago

Hossein, that's a good point. Perhaps we could produce a log line per file, summarizing various stats relating to the segmentation. I assume the stderr (and probably stdout too) of the segmenter code gets put in a log file.

Yiwen, regarding the train/valid objective values: I think you can safely increase the number of parameters in the model until you see overfitting, and at that point start to worry about image augmentation. (I see image augmentation as primarily a way to reduce overfitting by artificially expanding the amount of training data).

On Wed, May 30, 2018 at 4:47 PM, Hossein Hadian notifications@github.com wrote:

But, the segmentation algorithm also produces a final logprob for each image. I guess it would be helpful to write that to disk too (and maybe also its average on a whole test set).

On Thu, May 31, 2018 at 1:13 AM, Yiwen Shao notifications@github.com wrote:

train.py will give you the BCE loss on train and val data. If you want to further see the segmentation results on train and val, you need to run segment.py on them on using the scoring.py to get the MAP (mean average precision)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/waldo-seg/waldo/issues/63#issuecomment-393312602, or mute the thread https://github.com/notifications/unsubscribe-auth/AOW_Dbdi-Y_cr- Vm47LL9WtkK5MfHxjSks5t3wR-gaJpZM4UT-V4 .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/waldo-seg/waldo/issues/63#issuecomment-393313781, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu5oR8Iiz68AIcf4jPMfOxKF9ReIPks5t3wVNgaJpZM4UT-V4 .

YiwenShaoStephen commented 6 years ago

OK, will try it.

hhadian commented 6 years ago

Will do it.

On Thu, May 31, 2018 at 1:32 AM, Daniel Povey notifications@github.com wrote:

Hossein, that's a good point. Perhaps we could produce a log line per file, summarizing various stats relating to the segmentation. I assume the stderr (and probably stdout too) of the segmenter code gets put in a log file.

Yiwen, regarding the train/valid objective values: I think you can safely increase the number of parameters in the model until you see overfitting, and at that point start to worry about image augmentation. (I see image augmentation as primarily a way to reduce overfitting by artificially expanding the amount of training data).

On Wed, May 30, 2018 at 4:47 PM, Hossein Hadian notifications@github.com wrote:

But, the segmentation algorithm also produces a final logprob for each image. I guess it would be helpful to write that to disk too (and maybe also its average on a whole test set).

On Thu, May 31, 2018 at 1:13 AM, Yiwen Shao notifications@github.com wrote:

train.py will give you the BCE loss on train and val data. If you want to further see the segmentation results on train and val, you need to run segment.py on them on using the scoring.py to get the MAP (mean average precision)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/waldo-seg/waldo/issues/63#issuecomment-393312602, or mute the thread https://github.com/notifications/unsubscribe-auth/AOW_Dbdi-Y_cr- Vm47LL9WtkK5MfHxjSks5t3wR-gaJpZM4UT-V4 .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/waldo-seg/waldo/issues/63#issuecomment-393313781, or mute the thread https://github.com/notifications/unsubscribe-auth/ ADJVu5oR8Iiz68AIcf4jPMfOxKF9ReIPks5t3wVNgaJpZM4UT-V4 .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/waldo-seg/waldo/issues/63#issuecomment-393318321, or mute the thread https://github.com/notifications/unsubscribe-auth/AOW_Dcnq6lpxkGFVByLE2p1hwBfLdKxtks5t3wjmgaJpZM4UT-V4 .