Early Stopping(Patience) and Convergence

Hi,

When you train different datasets with different entity types, if one data set has not improved for rounds of epochs, but others are still improving, it seems like your code in train.py also update the CRF layer for the non-improving data set. What's the rationale behind that? Just because the shared char and word-level BiLSTM layer has been updated?

I tried running your model on two datasets with slightly different language style & format, and the F1 of one data set improved while that of the other data set decreased compared to training separate single-task models for each. Is there a way to fix this problem? When trained separately, the convergence epoch of one dataset is much higher than the other.

Thanks

yuzhimanhua / Multi-BioNER

Early Stopping(Patience) and Convergence #17