pierreablin / picard

Preconditioned ICA for Real Data
https://pierreablin.github.io/picard
BSD 3-Clause "New" or "Revised" License
107 stars 12 forks source link

Example request: compare Picard "settings" with FastICA, Infomax, Extended Infomax #28

Open sappelhoff opened 4 years ago

sappelhoff commented 4 years ago

In the README, it says the following about the Picard settings:

ortho=False, extended=False: same solution as Infomax
ortho=False, extended=True: same solution as extended-Infomax
ortho=True, extended=True: same solution as FastICA

It'd be nice to have an example (using real data, e.g., EEG data [because that's probably what most users deal with]) and compare in this example directly:

where the non-picard implementations are taken from MNE-Python (or sklearn in case of FastICA)

pierreablin commented 4 years ago

Picard minimizes the same cost function as the aforementioned algorithms, but due to non-convexity of the problem, it might very well find another solution (it is a general problem with ICA, for instance starting FastICA on the same dataset with different initializations might yield different solutions).

It means that we can only compare algorithms by looking at how small the gradient of the cost function is after the algorithm is run, which is what we do for Infomax and FastICA in the papers. Would that fit your expectations ?

sappelhoff commented 4 years ago

Yes, an example that does that plus an introductory description of what you said in the beginning of your post would be nice.

for all the people who want to

  1. use extended infomax because they read (like me) that it's apparently the most stable but
  2. want the algorithm to run faster (i.e., use picard)

so that they know that the cost function is the same, but the solution may be different (same as for same algo but different initializations)

agramfort commented 4 years ago
  1. use extended infomax because they read (like me) that it's apparently the most stable but

to nail this again this apparently stabiltiy is spurious. It's because the init is not random. It's stochastic as the minibatches are random during optimization. this brings non deterministic outputs but it's just due to the stochastic gradient descent employed.