shimo-lab / Universal-Geometry-with-ICA

Discovering Universal Geometry in Embeddings with ICA
https://aclanthology.org/2023.emnlp-main.283/
15 stars 0 forks source link

Why ica.mixing_? #2

Closed RyanLiut closed 5 months ago

RyanLiut commented 5 months ago

https://github.com/shimo-lab/Universal-Geometry-with-ICA/blob/92a1c4fd628f2c9457df710b461370fa1ecdcc65/universal/src/crosslingual_save_pca_and_ica_embeddings.py#L254C1-L257C30

ica = FastICA(**ica_params) ica.fit(pcaembed) R = ica.mixing ica_embed = pca_embed @ R

Hi, authors. Thank you for your work. I would like to ask why here using "ica.mixing", rather than "ica.components"? I think pcaembed @ ica.components is to transform the original data space into a more independent one and it is equal to "ica.fit_transform". I don't know if I am right when I am reading this sklearn guidance.

RyanLiut commented 5 months ago

I calcuated ica_embed, first is ica_embed = pca_embed @ ica.mixing_ (as the code shows) and second is ica_embed = pca_embed @ ica.components_.T. And I compared a ica_embed for one word (say, "play" with 300 of index). The first before the 10th dimension is: [ 0.25566859, -0.05167013, 1.98034625, 0.1228991 , -0.16191801, 4.73313955, -0.75972423, 0.15035432, -6.32458429, -0.61922542]

The Second: [ 0.2897501 , -0.04556104, 1.97500891, 0.15995669, -0.15892177, 4.70651733, -0.72284176, 0.12453925, -6.35039183, -0.65202694]

The result from the authors from Google drive is : [ 0.27389067, -0.07529904, 1.96008468, 0.13952179, -0.16024148, 4.72475989, -0.72279924, 0.12549091, -6.38518098, -0.64888658]

Well, I found:

  1. The first and the second got the similar results, although the ica.mixing_ and ica.components_ have totally different effects. This looks strange.
  2. They are both different from that provided by the authors (denoted as gt). But they are similar to gt, and ica.components_ is more similiar. Does that cause due to the different random seed choices?
ymgw55 commented 5 months ago

@RyanLiut Thank you for using our code. In this paper, we consider only the case where n_components = n_features. Then components_ is an orthogonal matrix, so components_.T = mixing_. Note that although we can use both components_.T or mixing_, as you point out, components_.T may make more sense in general.

RyanLiut commented 5 months ago

Thank you.