pmorerio / minimal-entropy-correlation-alignment

Code for the paper "Minimal-Entropy Correlation Alignment for Unsupervised Deep Domain Adaptation", ICLR 2018
MIT License
63 stars 17 forks source link

log_coral_loss question #5

Closed bjajoh closed 2 years ago

bjajoh commented 2 years ago

Hi @pmorerio ,

I have three questions regarding the log_coral_loss: https://github.com/pmorerio/minimal-entropy-correlation-alignment/blob/3b261683fbb24eca2d2efb6dff5c25ec4301b7fc/svhn2mnist/model.py#L59

It doesn't work with batch size 0 why is there no workaround? Is that part of the method? Is the cov result supposed to be complex? Did you try yet to apply this method to latent spaces? Or are you aware of anyone doing it?

Best, Bjarne

pmorerio commented 2 years ago

Hi,

It doesn't work with batch size 0 why is there no workaround? Is that part of the method?

I assume you mean batch_size=1, there is no such thing as batch_size=0. Anyway, since the covariance is calculated batch-wise, there cannot be a covariance matrix for a single element. In fact, the bigger the batch, the better, since it better approximates the statistics of the source and target domains. Also, batches which are significantly smaller than the dimension of the latent space may lead to numerical issues.

Is the cov result supposed to be complex?

The covariance matrix is symmetric and real, thus, guaranteed to have real eigvenvalues by the spectral theorem (I hope I am interpreting your question correctly).

Did you try yet to apply this method to latent spaces? Or are you aware of anyone doing it?

The method is indeed applied to the latent space spanned by the penultimate feature layer of the considered CNN.

I hope my answer can help, glad to clarify further if you need it. Best, P.

bjajoh commented 2 years ago

Thanks for the quick reply @pmorerio !

My bad, I was counting like a computer ;) ofc I mean batch_size=1. Just wondering because (1. / (batch_size - 1)) will result in a division by zero in case of batch size 1. Is there any recommended minimum batch size?

Sorry, I was referring to the log_cov which gets complex in my case.

Can you point me in the right direction who used it on latent spaces? I saw it is used by many, but only saw it on output logits.

Thanks for your help! Bjarne

pmorerio commented 2 years ago

HI @bjajoh,

Is there any recommended minimum batch size?

Empirically, I would recommend to have it at least equal to size of the latent space (hidden_size). In the example provided hidden_size=64 and batch_size=256, i.e. four times it.

Sorry, I was referring to the log_cov which gets complex in my case.

It may get complex because of numerical issues arising exactly from having a small batch size. You can try with larger batches and/or by regularizing covariance matrices by adding small values on the diagonal (decomment the last part of https://github.com/pmorerio/minimal-entropy-correlation-alignment/blob/3b261683fbb24eca2d2efb6dff5c25ec4301b7fc/svhn2mnist/model.py#L65)

Can you point me in the right direction who used it on latent spaces? I saw it is used by many, but only saw it on output logits.

As you can see I am actually applying it on the latent space spanned by the feature layer before the logits layer. In principle you can apply to any layer of the network https://github.com/pmorerio/minimal-entropy-correlation-alignment/blob/3b261683fbb24eca2d2efb6dff5c25ec4301b7fc/svhn2mnist/model.py#L109.

Let me know if this clarifies better! Best, P.

bjajoh commented 2 years ago

Hey @pmorerio ,

thanks for the clarification!

My latent space is for example 28x28x64, currently I'm averaging the 3rd axis down to a 2D grid. It results in a stable loss even with smaller batch sizes. Is this a viable method? Or is it braking the underlying meaning of the method?

Thanks for your help!

pmorerio commented 2 years ago

Hi, what the loss would like as input are actually matrices of size (batch_size, hidden_space_size). Your 2d grid adds an extra dimension, so you should vectorize it, however this will result in a high-dimensional vector (784 - could be too much, but you can try). Alternatively you can average along the spatial dimension in order to get a vector of length 64.

Stable loss could be because the weight of the loss is very low.

Hope this helps. P.

bjajoh commented 2 years ago

Hi @pmorerio ,

thank you sooo much! This is extremely helpful!

Best, Bjarne