Open jiweiqi opened 3 years ago
Alternatively, we could use weight pruning to encourage sparsity, although this assumes that all of the important wij has a large absolute value.
Example at
https://github.com/DENG-MIT/CRNN/blob/a94b20604fce305a55854a9e34c45fa2b28de8a8/case1/case1_hardthreshhold.jl#L76
All great!
I think pruning is a good idea. Does Julia have a similar prune training function as PyTorch/Keras?
I don't think there is one in Julia. Normally, I do it manually. Actually, I don't retrain the model after pruning since I use a very tight threshold and the performance is almost unchanged after pruning.
Another observation is that for smaller datasets, too small lr would result in local optimum overfitting.
Start a thread on the potential issue of early stopping on parameter inference (identify sparsity)
Currently, the rule of early stopping that we implemented is to stop training if the validation loss reaches a plateau. This is justified for deep learning with the goal of data fitting. For the deep learning model, the loss landscape is assumed to be flat near (sub) global minima, thus it is unnecessary to further training as long as the point is fell into that good valley.
Here, we additionally want a sparse model. But whether the model is sparse has little effect on the fitness by definition. Therefore, it is very likely early stopping will stop the model while the model is far away from sparse yet.
Instead, we shall train the model for a very long time.
Other tips I am not sure are: