CPD: Comparison to NMF on unfoldings of X

kimtonyhyun commented 7 years ago

Note: It turned out that the CPD fitting described in this issue was rank=29, rather than 15. Hence the CPD and NMF models were not nominally "rank matched". So, some of the conclusions that were made in this issue are false.

@ahwillia Here is my initial report on running NMF on unfoldings of the data tensor X. We have previously discussed this to be the "2D counterpart" to non-negative CPD factorization of the full tensor X.

I am still using a single session, namely c11m1d15. The exported data tensor has been normalized (all entries between 0 and 1) per normalize_tensor of #230. I re-ran non-negative CPD factorization on the normalized data tensor. I ran NNMF on the neuron-unfolding of X per fit_nnmf of #231. The data can be obtained here:

export-align-norm.mat (59 MB): Contains the normalized data tensor X.
export-align-norm_nncpd-r15.mat (60 MB): Contains the reconstructed tensor Xest obtained via rank 15 non-negative CPD factorization.
export-align-norm_nmf-r15.mat (60 MB): Contains the reconstructed tensor Xest obtained via rank 15 NNMF on the neuron unfolding of X.

kimtonyhyun commented 7 years ago

The first statistics that I looked at was the distribution of single-cell R squared values (i.e. between X(i,:,:) and Xest(i,:,:)) under the two dimensionality reduction methods. The two fits have the same nominal rank, but the number of numerical degree of freedom is significantly larger for the NMF approach. We thus expected that the NMF approach was going to dramatically going to outperform the CPD fit when evaluated on the basis of single-cell R squared values. I found that this wasn't the case.

Plotted below is the distribution of the R squared values for the two fitting methods:

Top panel shows the distribution of single-cell R squared values for the CPD fit;
Middle panel shows the distribution of single-cell R squared values for the NMF fit;
Bottom panel shows the distribution of R squared differences (cell-matched) between the two fits. Positive values here means that the CPD approach fared better, according to this metric.

It is clear that, on the whole, the CPD distribution fit the single neuron traces better than the NMF approach. An interesting aside: the cell with the maximal differential (CPD's Rsq=0.8309 vs NMF's Rsq=0.3803) is the notorious Cell 209, which was discussed in #232.

Aside: The above comparison considers the single-cell Rsq values as a figure of merit for the fit. On the other hand, as previously noted, neither CPD nor NMF uses the single-cell Rsq values as the objective function, but rather "normalized error" between X and Xest.

@ahwillia Can you point me to the part of the CPD code where you compute the normalized error (e.g. for final output formatting of fit_cpd)? I could factor that out into a separate function, and then evaluate the difference in normalized error for the CPD and NMF Xests.

kimtonyhyun commented 7 years ago

Another intuition that we had with regards to the NMF fit was that the additional numerical degrees of freedom would help it "capture" isolated Ca2+ transients. To the contrary, I found that the NMF fit misses this type of activity just as frequently (if not more) than the CPD:

Number of cells in CPD fit with Rsq < 0.3: 65 out of 487 (13%), vs.
Number of cells in NMF fit with Rsq < 0.3: 96 out of 487 (20%).

Note that I used Rsq = 0.3 as a threshold for "predicted single cell traces do not model data", based on previous experience. Here are some example traces at such Rsq values, taken from the NMF fit.

NMF fit, Rsq=0.3022: nmf_rsq0-3

NMF fit, Rsq=0.2017: nmf_rsq0-2

NMF fit, Rsq=0.1025: nmf_rsq0-1

kimtonyhyun commented 7 years ago

There does exist, however, examples where the NMF fit manages to capture the "wiggles" in individual traces better than the CPD fit. This is what we had intuitively expected. For example, consider Cell 314.

Here is the NMF fit (Rsq=0.9240), note how the fitted trace captures multiple transients within a single trial: cell314_nmf

Here is the CPD counterpart (Rsq=0.8674), which seems to model only the strongest transient at the middle of the trial: cell314_cpd

kimtonyhyun commented 7 years ago

I then evaluated the "time-trial" factors obtained via NMF. Recall that I used rank=15 NMF factorization for this discussion.

Here is a visualization of one of the NMF factors (k=1):

Left panel: I "re-folded" trials of the factor's temporal trace, and plotted as scatter. Solid red line is the trial average.
Right panel: The factor's temporal trace shown as a raster plot.

In principle, the NMF factorization on the neuron unfolding of X does not have a hard constraint based on the trial structure. So, there's no hard requirement that its factor traces show trial regularity (and therein lies the additional degrees of freedom for the fit). However, the trial structure is clearly evident in the above NMF factor.

For instance, it seems very plausible to further factor this "time-trial" vector from NMF into a product of time and trial sub-factors. Though, admittedly, this "successive factoring" approach for factoring the tensor X is not as principled as the CPD approach.

Here are additional NMF factors: nmf_k02 nmf_k03 nmf_k06 nmf_k08

You can see all 15 NMF factors here. As a rule, I found that all NMF temporal traces show trial-based regularity.

kimtonyhyun commented 7 years ago

@ahwillia My initial conclusions on NMF on neuron unfoldings of X is that it looks very much like the results of CPD factorization.

Specifically, while the NMF approach does not "hard-code" the trial structure of the prefrontal dataset into the data tensor, the factorization "recovers" the trial regularity regardless.

Equivalently, the NMF factorization does not make use of the degrees of freedom it has along the temporal dimension (and hence many of our expectations were incorrect).

Any thoughts and comments so far?

ahwillia commented 7 years ago

Can you point me to the part of the CPD code where you compute the normalized error (e.g. for final output formatting of fit_cpd)?

https://github.com/schnitzer-lab/analysis/blob/master/tensor/fit_cpd.m#L50

I'm very interested to compare NMF to CPD in terms of reconstruction error. If they have the same number of components NMF should beat CPD, since (non-negative) CP decomposition is one solution to NMF. You should be able to convert the CP decomposition factors into an NMF model:

A = % neuron factors (n_neurons x rank)
B = % time factors (n_time x rank)
C = % trial factors (n_trials x rank)
X_unf = % neurons unfolding of tensor
BC = khatrirao(B, C) % the "khatri-rao product" (available in tensor toolbox)
X_est =  A * transpose(BC) % this is a valid solution to NMF
err = norm(X_unf - X_est, 'fro') / norm(X_unf, 'fro')

So the only way NMF should lose to CPD in terms of reconstruction error is if it gets caught in a local min during optimization.

You can read more on the Khatri-Rao product in Tammy's review: http://www.sandia.gov/~tgkolda/pubs/pubfiles/TensorReview.pdf

ahwillia commented 7 years ago

I found that all NMF temporal traces show trial-based regularity.

Yep, I'd expect this.

NMF on neuron unfoldings of X is that it looks very much like the results of CPD factorization.

I think the reconstructions of single cells should look quite similar. It is encouraging that it doesn't seem to be outperforming CPD.

The question is whether it is easy to identify neural populations and dynamics that are associated with error-coding, are sensitive to location, etc. I think you'll find it is more difficult to do so with the unfoldings... You might try doing NMF on the trials unfolding?

kimtonyhyun commented 7 years ago

The question is whether it is easy to identify neurons that are error-coding, sensitive to location, etc. with this technique. I think you'll find it is more difficult to do so... You might try doing NMF on the trials unfolding?

My approach to finding the functional groups would be to use the factors from the neural unfolding. For example, in the examples shown above:

The k=2 factor likely corresponds to reward,
The k=3 factor likely corresponds to one of the start locations.

I'll explore that and report here.

kimtonyhyun commented 7 years ago

I think there is an interesting conceptual distinction between the approaches that you and I laid out for identifying functional groups.

In my approach, I am essentially approximating a CPD decomposition of the data tensor, by taking successive NMFs: i.e. firstly, in the [neuron x time-trial] matrix; then, secondly, in the [time x trial] sub-matrix. The first NMF step doesn't necessarily have to produce time-trial factors that allow for this successive approach but, empirically, this seems possible to perform with the prefrontal dataset.

You, on the other hand, are suggesting an explicit NMF factorization on a trial unfolding, i.e. one of these two unfolded matrices: to see if the trial vectors can be readily interpreted with experimental metadata. I agree that this is an important comparison case between NMF and CPD, as this is "more explicitly NMF" than my "approximate CPD with multiple NMFs" approach.

ahwillia commented 7 years ago

I am essentially approximating a CPD decomposition of the data tensor, by taking successive NMFs: i.e. firstly, in the [neuron x time-trial] matrix; then, secondly, in the [time x trial] sub-matrix.

I believe that there are papers that explore this idea as an optimization technique for CP decomposition. I'll ask Tammy and do a literature search.

kimtonyhyun commented 7 years ago

I think the reconstructions of single cells should look quite similar. It is encouraging that it doesn't seem to be outperforming CPD.

Yes, at the single-neuron trace level, I'd expect both CPD and NMF reconstructions to look similar (although I expected the NMF approach to perform better according to this metric).

What I had meant by saying "it looks very much like the results of CPD factorization", was that the NMF factors in the time-trial dimension looked comparable to their counterparts in CPD.

Specifically, the NMF factors in the time-trial dimension have strong regularity along trials -- even though this wasn't a hard constraint -- and thus would be conducive to further factorization between time and trial dimensions, yielding something that might look very similar to the CPD factors.

kimtonyhyun commented 7 years ago

When you wrote:

err = norm(X_unf - X_est, 'fro') / norm(X_unf - X_est, 'fro')

I assume you meant:

err = norm(X_unf - X_est, 'fro') / norm(X_unf, 'fro')

The reconstruction errors are as follows:

NMF: err = 0.4356
CPD: err = 0.4065

The numerical values are similar, but CPD error is lower than that of NMF! I agree that local minima during optimization is a probable culprit. (There is also a num_replicates option in Matlab's nnmf, we might make use of that...)

Note: for CPD, I used Xest = full(cpd); Xest = Xest.data; per the tensor docs, rather than expanding the individual factors via the KR product per your comment. I don't know how to obtain A, B, and C from the new output format of fit_cpd.

I believe that there are papers that explore this idea as an optimization technique for CP decomposition. I'll ask Tammy and do a literature search.

Going forward, should I post new issue threads in the cpd-paper repository so that all authors have visibility into our discussions on CPD?

kimtonyhyun commented 7 years ago

Here is my effort on identifying functional groups, based on examining the time-trial vectors in the neural unfolding. (As discussed previously, I think this approach is approximating a CPD decomposition of the data tensor by successive NMF factorization.)

To begin, I "wrapped" the time-trial vector into trials, by using the the a priori known periodicity. I showed the resulting matrices ([num_trials x num_samples_in_trial]) previously, which showed strong repeatability in trials.

I now took that time-trial matrix, and sorted the trials based on various trial metadata. In the figure below:

The left panels show the time-trial NMF vector over all trials (same as my previous visualization).
On the right, the top 2 panels split the time-trial matrix into correct and incorrect trials.
On the right, the middle 2 panels split the time-trial matrix into East start and West start trials.
On the right, the bottom 2 panels split the time-trial matrix into North end and South end trials.

cc @forea I have often produced these types of raster plots before, but they were previously always at the single neuron level. The plots that I present here, on the other hand, represent the collective dynamics of sub-populations of neurons as identified by the NMF factorization.

The above example factor (k=2) shows a strong correspondence to correctness of the trial (or reward; the fine distinctions will have to come later).

Similarly, I found that these other NMF factors showed clear correspondence to other task variables:

k=2: Correctness/Reward
k=3: West start
k=4: East start
k=11: Error
k=12: South end

Interestingly, I did not find an NMF factor that corresponded to North end, despite the fact that:

At the single-cell level, there are neurons that correlate with North-end, both predictive (e.g. Cell 203) and contemporaneous (e.g. Cell 381).
c11m1d15 is a Allo-north to Ego-right switch day, so there is a good sampling of both North and South ends in this session. (In fact there is more sampling of North than South on this day, roughly 120 vs. 30 trials.)

Here is another example of the NMF raster plot. I show the "Error" factor (k=11): nmf_k11

You can view the "sliced" raster plots of all 15 NMF factors here.

ahwillia commented 7 years ago

Going forward, should I post new issue threads in the cpd-paper repository so that all authors have visibility into our discussions on CPD?

No I think here is fine.

I assume you meant: err = norm(X_unf - X_est, 'fro') / norm(X_unf, 'fro')

Yes 😓

I don't know how to obtain A, B, and C from the new output format of fit_cpd.

Something along these lines should work

% choose replicate i of the rank-r models
X = % data tensor
all_models = fit_cpd(X, ...);
model = all_models(i, r);

lam = nthroot(model.decomp.lambda, 3);
% Note - lam should be a row vector I think? If not, transpose it.
A = model.decomp.u{1} .* repmat(lam, size(X,1), 1);
B = model.decomp.u{2} .* repmat(lam, size(X,2), 1);
C = model.decomp.u{3} .* repmat(lam, size(X,3), 1);

% Then you can make the NMF as I described before
BC = khatrirao(B, C) % the "khatri-rao product" (available in tensor toolbox)
X_est =  A * transpose(BC) % this is a valid solution to NMF

The lam variable contains the higher-order analogs of singular values you mentioned earlier.

kimtonyhyun commented 7 years ago

I then looked at the neural loadings of some of these NMF factors. Here, I take the "Error" (k=11) factor as an example.

Here is the distribution of values in the neural vector associated with factor k=11:

The left panel shows the histogram of single-neuron weights in the neural vector,
The right panel shows the single-neuron weights sorted by magnitude.

nmf_k11_nvals

(Based on the right panel, it seems to me that there is a "transition point" around sorted rank ~70 or so.)

I then evaluated the single cells that have high amplitude in this NMF factor. For example, here is Cell 472 which has the highest amplitude of 16.3546 in the neural vector: nmf_k11_s01_cell472

Another example, Cell 288 which has amplitude 11.0110 (sorted rank 10): nmf_k11_s10_cell288

You can view the individual cell rasters, sorted by their amplitude in the NMF vector here. I plotted cells with sorted ranks 1, 2, 3 ..., 9, 10, 20, 30, ..., 90, 99. Look at the file name for the sorting rank. I would say that the top 10 cells are very highly correlated with incorrect trials. I would draw a threshold around sorted rank ~40.

For fun, I plotted the top 40 "error cells" as identified by NMF as a cell map (error cells shown in red; all others green): nmf_k11_top40_cellmap

The question is whether it is easy to identify neural populations and dynamics that are associated with error-coding, are sensitive to location, etc.

To conclude, I would say that it was pretty straightforward to use NMF factorization on neuron unfoldings to identify the error population (at least in this example case). It's not clear to me yet how we are doing in terms of "completeness" and other details.

kimtonyhyun commented 7 years ago

Something along these lines should work

Does your method of generating the estimate of the neuron unfolding of Xest generate different results than if I just perform the neuron unfolding of the Xest = full(cpd) tensor?

ahwillia commented 7 years ago

It should do the same thing. I mostly suggest it as a sanity check.

On Jan 21, 2017 11:11 AM, "Tony" notifications@github.com wrote:

Something along these lines should work

Does your method of generating the estimate of the neuron unfolding of Xest generate different results than if I just perform the neuron unfolding of the Xest = full(cpd) tensor?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/schnitzer-lab/analysis/issues/235#issuecomment-274281592, or mute the thread https://github.com/notifications/unsubscribe-auth/AAm20Rk7BwKJBQxvwEQa9qJGmBPLv68eks5rUlhUgaJpZM4LpiwK .

kimtonyhyun commented 7 years ago

I performed a similar analysis of the NMF factor k=3 (West representation).

Here is a distribution of the values in the neural vector (k=3): nmf_k03_nvals

Here is the top rank Cell 137 (val=28.3601): nmf_k03_s01_cell137

Number 2: Cell 253 (val=27.5579): nmf_k03_s02_cell253

Number 3: Cell 401 (val=26.0921): nmf_k03_s03_cell401

Number 10: Cell 385 (val=19.4015), admittedly not as clean: nmf_k03_s10_cell385

Number 20: Cell 456 (val=16.6165): nmf_k03_s20_cell456

Cell 456 above seems to me more of a type that always fires at the beginning of a trial, regardless of West/East start location. As such, I wonder if this cell is being represented in the NMF factorization as the superposition of "East" and "West" factors.

This, in turn, suggests to me to consider a "factor-selectivity index" for a cell for the purposes of functional categorization, rather than considering only the absolute values in the neural vector (as I've done here).

Here is a cell map of the top 20 "West cells" selected by NMF factor k=3: nmf_k03_cellmap

And both West and Error types: nmf_cellmap

kimtonyhyun commented 7 years ago

Closing since the CPD model here had rank=29, rather than the assumed 15.

kimtonyhyun / analysis

CPD: Comparison to NMF on unfoldings of X #235