kimtonyhyun / analysis

2 stars 0 forks source link

Across-Day Normalization and add alternate method for tensor fitting #230

Closed ahwillia closed 7 years ago

ahwillia commented 7 years ago

This code was provided by my friend Casey - it is a faster method based on randomized least-squares for fitting CP decompositions. I haven't had the chance to test it thoroughly yet, but feel free to give it a go @kimtonyhyun.

It should operate the same as cp_als(data, n_components) --> cprand(data, n_components)

kimtonyhyun commented 7 years ago

Since this is a behind-the-scenes code update, just let me know when I should check it for face validity (runs without error; produces CPD output), and we can merge.

kimtonyhyun commented 7 years ago

Btw, does fit_cpd work on tensors with arbitrary number of dimensions (not just 3)? Specifically, will it compute the PCA factorization if I provide a two-dimensional matrix X?

ahwillia commented 7 years ago

It won't give you orthogonal components like PCA. It would also be slower to fit. We could choose to support this, but I don't see the benefit given that matlab already has pca(...) and svd(...) functions. This wrapper function mostly exists as a glorified for loop to run multiple optimizations from different initial parameters (not necessary for PCA).

ahwillia commented 7 years ago

But to be clean and complete we should probably make sure that it works for 4th-order and higher-order tensors.

kimtonyhyun commented 7 years ago

. We could choose to support this, but I don't see the benefit given that matlab already has pca(...) and svd(...) functions.

One benefit might be to run a non-negative factorization of a 2-dimensional X. (I don't think that's possible with Matlab pca or svd.)

But to be clean and complete we should probably make sure that it works for 4th-order and higher-order tensors.

You could just have a few test cases, where you build a n-th order tensor from factor products then sprinkle a bit of noise, and to see if you more-or-less recover the original factors.

ahwillia commented 7 years ago

Good idea. But there happens to be a matlab function already for that as well - https://www.mathworks.com/help/stats/nnmf.html

On Sun, Jan 8, 2017 at 8:03 AM, Tony notifications@github.com wrote:

One benefit might be to run a non-negative factorization of a 2-dimensional X. (I don't think that's possible with Matlab pca or svd.)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/schnitzer-lab/analysis/pull/230#issuecomment-271159963, or mute the thread https://github.com/notifications/unsubscribe-auth/AAm20T9YcLEh2TTdBl2DNGC0Fuwkpajcks5rQQi5gaJpZM4LKYvX .

kimtonyhyun commented 7 years ago

Thanks for the tip on nnmf -- didn't know about that one.

kimtonyhyun commented 7 years ago

By the way, what do you think about altering the visualize_neuron_ktensor visualization so that (for display purposes) the neuron factors are sorted by weight on a per-factor basis?

This would make neuron axes not comparable across factors, but it's hard to do this anyway given the current visualization. By sorting each factor independently, I could at least get a sense of how many neurons make up a significant membership in a particular factor (e.g. 30 neurons? 80 neurons?).

ahwillia commented 7 years ago

Happy to clean this up before merging - just let me know in review!

ahwillia commented 7 years ago

By the way, what do you think about altering the visualize_neuron_ktensor visualization so that (for display purposes) the neuron factors are sorted by weight on a per-factor basis?

Hmm yeah I hadn't thought about that. Have to think a bit more about it, but it seems like a reasonable idea - we should at least give the user the option to do that. I have a couple other changes I'd like to make to visualize_neuron_ktensor - I'll open a separate PR in the next few days?

kimtonyhyun commented 7 years ago

Ok, let's punt the visualize_neuron_ktensor into a separate PR.

ahwillia commented 7 years ago

Thanks for the detailed feedback @kimtonyhyun - what is your opinion of changing cp_fit so that it's output is a matrix of structs of size n_replicates x n_rank. This way you can select, say, all the rank-5 models by:

models = cp_fit(data, ... )
models(:,5)
[models(:,5).error] % list of normalized error

I can add that in to this PR or the next one if you like it.

kimtonyhyun commented 7 years ago

The matrix organization of outputs looks good to me. If I run models = cp_fit(..., 'min_rank', 5, 'max_rank', 15), the first 4 columns of models would be empty then?

ahwillia commented 7 years ago

Sure, that sounds like a good convention

ahwillia commented 7 years ago

Roger that - I did some work on those fixes earlier, but not done yet.

On Jan 12, 2017 7:21 PM, "Tony" notifications@github.com wrote:

Ok, let's punt the visualize_neuron_ktensor into a separate PR.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/schnitzer-lab/analysis/pull/230#issuecomment-272351193, or mute the thread https://github.com/notifications/unsubscribe-auth/AAm20RiKZT0OThwlCOzSYReFMbdQcn-Aks5rRu2cgaJpZM4LKYvX .

kimtonyhyun commented 7 years ago

Done?

ahwillia commented 7 years ago

I think so - could you give it a quick look over and test? Apologies if there are still some rough edges.

On Sun, Jan 15, 2017 at 11:34 AM, Tony notifications@github.com wrote:

Done?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/schnitzer-lab/analysis/pull/230#issuecomment-272717873, or mute the thread https://github.com/notifications/unsubscribe-auth/AAm20Yt8TWBXKphSeEdMPLHFBLkw0L3dks5rSnS6gaJpZM4LKYvX .

kimtonyhyun commented 7 years ago

When I run fit_cpd now, I find a long and unexplained console dump of s and best_models, even though I terminated the fit_cpd command with a semicolon:

>> [models, best_models] = fit_cpd(Xn, 'min_rank', 1, 'max_rank', 15, 'num_starts', 10);

OPTIMIZATION RUN 1
     rank: 1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.
OPTIMIZATION RUN 2
     rank: 1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.
OPTIMIZATION RUN 3
     rank: 1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.
OPTIMIZATION RUN 4
     rank: 1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.
OPTIMIZATION RUN 5
     rank: 1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.
OPTIMIZATION RUN 6
     rank: 1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.
OPTIMIZATION RUN 7
     rank: 1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.
OPTIMIZATION RUN 8
     rank: 1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.
OPTIMIZATION RUN 9
     rank: 1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.
OPTIMIZATION RUN 10
     rank: 1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.

s =

     4

best_models = 

1x15 struct array with fields:

    error
    decomp

s =

     3

best_models = 

1x15 struct array with fields:

    error
    decomp

s =

     6

best_models = 

1x15 struct array with fields:

    error
    decomp

s =

     2

best_models = 

1x15 struct array with fields:

    error
    decomp

s =

     1

best_models = 

1x15 struct array with fields:

    error
    decomp

s =

     8

best_models = 

1x15 struct array with fields:

    error
    decomp

s =

     6

best_models = 

1x15 struct array with fields:

    error
    decomp

s =

     2

best_models = 

1x15 struct array with fields:

    error
    decomp

s =

     2

best_models = 

1x15 struct array with fields:

    error
    decomp

s =

    10

best_models = 

1x15 struct array with fields:

    error
    decomp

s =

    10

best_models = 

1x15 struct array with fields:

    error
    decomp

s =

     6

best_models = 

1x15 struct array with fields:

    error
    decomp

s =

     6

best_models = 

1x15 struct array with fields:

    error
    decomp

s =

    10

best_models = 

1x15 struct array with fields:

    error
    decomp

s =

     1

best_models = 

1x15 struct array with fields:

    error
    decomp

I also find that cpd_scree_plot errors out:

>> cpd_scree_plot(models);
Undefined function or variable 'ln'.

Error in cpd_scree_plot (line 22)
uistack(ln, 'bottom')

Visualization also fails:

>> visualize_neuron_ktensor(best_models(15).decomp, trial_meta, 'start');
Improper index matrix reference.

Error in visualize_ktensor>format_axes (line 165)
                set(Ax(1,f).Title,'String',ttl{f})

Error in visualize_ktensor (line 66)
format_axes(Ax, res.title, res.xlabel, res.XTick, res.XTickLabel);

Error in visualize_neuron_ktensor (line 31)
[Ax, G, BigAx] = visualize_ktensor(decomp, 'c', c, 'plots', plt, ...
kimtonyhyun commented 7 years ago

Let me know when to re-review. Btw, I'm mostly following along with the instructions on the tensor doc.

ahwillia commented 7 years ago

Sorry about that - should have known better than to not give it a test. Semicolons are now added and the scree plot is fixed.

I can't reproduce your visualize_neuron_ktensor bug though... Currently scratching my head on that.

kimtonyhyun commented 7 years ago

I can't reproduce your visualize_neuron_ktensor bug though... Currently scratching my head on that.

What version of Matlab are you building on?

kimtonyhyun commented 7 years ago

All right, the basic walkthrough works now:

>> Xn = normalize_tensor(X, trial_meta);
>> [models, best_models] = fit_cpd(Xn, 'min_rank', 1, 'max_rank', 15, 'num_starts', 10, 'method', 'cp_nnals');

OPTIMIZATION RUN 1
     fitting models, rank: 1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.
OPTIMIZATION RUN 2
     fitting models, rank: 1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.
OPTIMIZATION RUN 3
     fitting models, rank: 1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.
OPTIMIZATION RUN 4
     fitting models, rank: 1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.
OPTIMIZATION RUN 5
     fitting models, rank: 1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.
OPTIMIZATION RUN 6
     fitting models, rank: 1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.
OPTIMIZATION RUN 7
     fitting models, rank: 1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.
OPTIMIZATION RUN 8
     fitting models, rank: 1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.
OPTIMIZATION RUN 9
     fitting models, rank: 1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.
OPTIMIZATION RUN 10
     fitting models, rank: 1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.

Scree plot works:

>> cpd_scree_plot(models);

cpdscreeplot

Visualization works on Matlab 2015b (previously I was testing on 2013b):

>> visualize_neuron_ktensor(best_models(15).decomp, trial_meta, 'correct');

visneuronktensor

kimtonyhyun commented 7 years ago

The PR looks good to me.

kimtonyhyun commented 7 years ago

Here is how the normalized error varies for CPD models of rank up to 50: cpd_scree_plot50