Open sappelhoff opened 4 years ago
Picard minimizes the same cost function as the aforementioned algorithms, but due to non-convexity of the problem, it might very well find another solution (it is a general problem with ICA, for instance starting FastICA on the same dataset with different initializations might yield different solutions).
It means that we can only compare algorithms by looking at how small the gradient of the cost function is after the algorithm is run, which is what we do for Infomax and FastICA in the papers. Would that fit your expectations ?
Yes, an example that does that plus an introductory description of what you said in the beginning of your post would be nice.
for all the people who want to
so that they know that the cost function is the same, but the solution may be different (same as for same algo but different initializations)
- use extended infomax because they read (like me) that it's apparently the most stable but
to nail this again this apparently stabiltiy is spurious. It's because the init is not random. It's stochastic as the minibatches are random during optimization. this brings non deterministic outputs but it's just due to the stochastic gradient descent employed.
In the README, it says the following about the Picard settings:
It'd be nice to have an example (using real data, e.g., EEG data [because that's probably what most users deal with]) and compare in this example directly:
where the non-picard implementations are taken from MNE-Python (or sklearn in case of FastICA)