Open lwb2099 opened 11 months ago
I have a few qustions about this loss: 1. threshold is no use, why; 2. if two attention maps are the same, loss should be -1, but shouldn't loss be optimized from somthing positive to neat zero?
same question
I have a few qustions about this loss: 1. threshold is no use, why; 2. if two attention maps are the same, loss should be -1, but shouldn't loss be optimized from somthing positive to neat zero?