Why average loss value by `batch_size` when using `batch_hard` method?

Hi Olivier,

Thanks for your work, it helped me a lot! Just one question:

In the batch_hard_triplet_loss method, you calculate the average loss value using tf.reduce_mean, which is equivalent to tf.reduce_sum(tensor) / len(tensor). As a result the losses are averaged over batch_size. Shouldn't the output loss value be the average over number of non-zero losses? For example, let's say triplet_loss at line 218 is [1.4, 0, 0, 0 1.6], the output of the function will be 0.6 according to the code, but shouldn't it be 1.5?

I saw a similar discussion under your triplet loss post, but the code doesn't seem to reflect the thought.

omoindrot / tensorflow-triplet-loss

Why average loss value by `batch_size` when using `batch_hard` method? #55