Open isaacgerg opened 5 years ago
Hi, @isaacgerg thanks for the suggestion. Could you open a PR?
Unfortunately, I cannot due to my network settings.
Hey, I've seen this implementation in a lot of projects and I feel that it is not right to set alpha parameter that way. Alpha as scalar makes sense in binary example, since it's a weight of the positive samples i.e loss = - gt alpha ((1 - pr)^gamma) log(pr) - (1 - gt) (1-alpha) (pr^gamma) log(1 - pr). Positive samples are weighted by alpha, negative samples are weighted by 1-alpha, all good. If we look at multiple classes, there is no distinguished 'negative' class, you output a softmax vector of class probabilities. So, if we go by the same logic as in binary case, each class should be weighted separately. If you use one value, you basically just scaling the loss by that factor and that's all, no class weighting is done here. I might be missing something here, so please correct me if I'm wrong.
Edit: I've checked the function more thoughtfully and you can use vector set alphas, so that was my misunderstanding. I would still add a better description of alpha in the function, and change default alpha value to something more appropriate, for example alpha=1, so all classes are just weighted equally, no scaling is done. alpha=0.25 doesn't make much sense for multiclass example
Hey @dmonkoff, You are right, this is an older implementation. The issue has been already solved here, so as I've already written in the latest version you need to specify α as an array, and the size of the array needs to be consistent with the number of categories, representing the corresponding weight of each category.
I'll take into account your suggestions, thank you very much!
With this pattern, I don't need dill when using load_model.