I am wondering how to train the model on images that have multiple labels?
In my understanding, the formulation (2) --- pθ,φ(xt|xt+1, y) = Zpθ(xt|xt+1)pφ(y|xt) --- in the paper shows that there is only one label y that has been incorporated into the Conditional Reverse Noising Process. If there are multiple labels for an image, should the label y be the mean of these multiple labels or the pφ(y|xt) should be ∏_i=1 ^k pφ(y_i|xt) for k labels?
Thank you so much for releasing this code!
I am wondering how to train the model on images that have multiple labels? In my understanding, the formulation (2) --- pθ,φ(xt|xt+1, y) = Zpθ(xt|xt+1)pφ(y|xt) --- in the paper shows that there is only one label y that has been incorporated into the Conditional Reverse Noising Process. If there are multiple labels for an image, should the label y be the mean of these multiple labels or the pφ(y|xt) should be ∏_i=1 ^k pφ(y_i|xt) for k labels?
Thanks again.