CPD: Comparison of non-negative and unconstrained fits

@ahwillia Here I will post my thoughts on the comparison of non-negative and unconstrained CPD fits at the single neuron level.

I am using:

Single shifting day session "c11m1d15", exported via timewarp_method='align_to'.
Non-negative CPD fit: rank=15, yielded cpd.Rsq = 0.3639.
Unconstrained CPD fit: rank=15, yielded cpd.Rsq = 0.3959. So, as expected, the unconstrained fit a little better than the non-neg version.

You can access the MAT files here: https://www.dropbox.com/sh/su7gc5j19i0w4tn/AADrIP8J68nfh7QzZcQ3NCI7a?dl=0

Here is a difference in single-cell Rsq values. Positive means that the unconstrained fit achieved a higher Rsq value; negative means that the non-negative fit performed better: c11m1d15_diff Note that the distribution is dominantly positive, meaning that the unconstrained CPD does a better job of fitting the raw data than the non-negative CPD as expected.

I then inspected neurons on a case-by-case basis, sorted by the magnitude of the difference in R squared values (i.e. neurons where the unconstrained fit has a much better score than the non-negative fit). I will describe my observations as separate posts.

The first case -- Cell 209, with maximal Rsq delta of 0.2848 (in favor of the unconstrained fit) -- turned out to be quite interesting.

This cell is apparently relevant to the task, as evidenced by the stereotyped response for each trial; c11m1d15_cell209

The onset of activity is as the mouse approaches the final arm: c11m1d15_cell209_trial50

One (weak) speculation for the "type" would be that Cell 209 is a North-indicating cell, but only before the shift... Recall that c11m1d15 is allo-north --> ego-right.

Here is the non-negative CPD fit (Rsq=0.5398): c11m1d15_cell209_cpd-nonneg

Note that this is a case where the fit "makes up" calcium activity when there is none in the actual data (i.e. see the "worst fit" trials)! Note that the spurious activity -- mostly in the latter half of the session -- resembles the actual activity in the earlier part of the session. So, it seems to me that there is a case of (incorrect) generalization by the non-negative CPD fit.

(It would also be interesting to see the raster plots for the CPD fitted data, rather than the raw data. I don't have the tooling for this yet.)

Here is the unconstrained CPD fit (Rsq=0.8246): c11m1d15_cell209_cpd-uncon where we can see better suppression of the spurious fitted activity in the latter half of the session!

The next few examples of single-cell comparison turned out to be less interesting. It seems that there can be a big boost to the Rsq metric, by better fitting the low-amplitude wiggles in the trace near zero.

Consider Cell 423 -- with the second-highest Rsq delta in favor of the unconstrained fit (0.2651).

Here is the non-negative fit (Rsq=0.2565): c11m1d15_cell423_cpd-nonneg

Here is the unconstrained fit (Rsq=0.5216): c11m1d15_cell423_cpd-uncon

So, both fits miss the sparse Ca2+ events in the actual data. The unconstrained fit appears to have a higher score just by fitting the baseline wiggles better.

Similarly, consider Cell 26 -- with the fifth-highest delta in favor of the unconstrained fit.

Here is the non-negative fit (Rsq=0.0703): c11m1d15_cell26_cpd-nonneg

And the unconstrained fit (Rsq=0.3050): c11m1d15_cell26_cpd-uncon

Again, both CPD fits fail to explain the (weak) Ca2+ events, but the unconstrained fit appears to better ride the baseline wiggles in the raw data.

This case of examples might suggest that we need to better format the traces for non-negative fitting. For each trace, I centered the mode of the distribution around zero (see #221), but this allows negative values into the baseline-adjusted trace -- which clearly cannot be modeled by the non-negative fitting.

Cool! This is exactly the kind of cell that I would expect is hard to fit with these decompositions - are there any clues in the raw video as to why the cell stopped firing?

Another possible difference between non-negative and unconstrained CPD fits may be that the unconstrained fits may better model temporally-localized Ca2+ events in the raw data.

For example, consider Cell 373 (seventh in the Rsq delta in favor of the unconstrained fit).

The non-negative fit (Rsq=0.4666): c11m1d15_cell373_cpd-nonneg

The unconstrained fit (Rsq=0.6627): c11m1d15_cell373_cpd-uncon

Consider also Cell 186.

The non-negative fit (Rsq=0.5321): c11m1d15_cell186_cpd-nonneg

The unconstrained fit (Rsq=0.6789): c11m1d15_cell186_cpd-uncon

So, it seems that the unconstrained fits may have a better ability to fit temporally localized Ca2+ events in the raw traces.

To be fair to the non-negative factorization, here is Cell 470 which has a Rsq difference of -0.2147 -- i.e. the example where the non-negative fit best outperformed the unconstrained fit.

Here is the non-negative fit (Rsq=0.7425): c11m1d15_cell470_cpd-nonneg

Here is the unconstrained fit (Rsq=0.5278): c11m1d15_cell470_cpd-uncon

In both cases, the fitted traces captured the "essence" of the raw trace -- i.e. the trials when it was active and the phase of trial where the activity arose -- but the non-negative fitting appears to have a better (though still not perfect) mimicking of the amplitude and a better identification of event onset.

To summarize, my zeroth-pass inspection of the non-negative and unconstrained CPD fits showed:

An example cell where the non-negative fit incorrectly generalized the activity of a neuron, whereas the unconstrained fit better suppressed spurious activity.
That the non-negative fits are being (unfairly) penalized by having negative baseline wiggles, which the constrained model obviously cannot track.
That the unconstrained fit may have a better ability to fit temporally-localized ("sharp") waveforms. (Though I'm not 100% certain on this generalization yet...)

Cool! This is exactly the kind of cell that I would expect is hard to fit with these decompositions - are there any clues in the raw video as to why the cell stopped firing?

I double checked the DFF video now. This is a very high SNR cell, and I have strong confidence that the trace accurately reflects the activity present in the DFF video. For instance, it's unlikely that the cell "faded out" over the session or the like -- the cell shape in the DFF movie is pretty clear even near the end of the session when the cell is firing more sparsely.

I also looked at its cross-day matches on all other days. It's actually not clear to me at all what the "right" interpretation of this cell is, if any. It may not be a simple North-indicating cell: c11m1_d15-c515

An example cell where the non-negative fit incorrectly generalized the activity of a neuron, whereas the unconstrained fit better suppressed spurious activity.

This is interesting. A couple of potential follow-up analyses: A. Does adding more factors to the nonneg tensor decomp fix the problem with this cell (at some point it should...) B. Do all rank-15 NNCP models fail to fit this cell? I.e. if you fit from diff initial params does it consistently miss this cell? C. Does playing with trace normalization fix this (see below)?

That the non-negative fits are being (unfairly) penalized by having negative baseline wiggles, which the constrained model obviously cannot track.

This indicates to me that we should sync up on normalization. Here is the method that I'm currently using - https://github.com/schnitzer-lab/analysis/pull/230/commits/342e2a8920cc2e2703ec193486889a1b5e496035#diff-3ee4ade631532fb83fc1132533d50949

As you mentioned last time we met - it is simply subtract off the min and divide by max for each neuron (separately for each day). Remind me, are you doing normalization in the export function? I can open a PR if you want. I suspect that this will improve the results of NNCP noticeably.

That the unconstrained fit may have a better ability to fit temporally-localized ("sharp") waveforms. (Though I'm not 100% certain on this generalization yet...)

This may not be a big deal for total reconstruction error - though it matters for R-squared. I wonder if normalization will also help with this problem.

In general, it is going to be hard for tensor decomp (and PCA as well I think) to model these fast changes. The question is whether they are negatively impacting our analysis.

A bit of a non-sequitur, but I think it's useful to explicitly state the high-level motivations behind my efforts in exploring CPD fits. (Feel free to add to them.) They are:

What kind of features in the raw Ca2+ data can be -- and is -- modeled by CPD-based dimensionality reduction (i.e. hidden behind the ~0.3–0.4 variance explained metric)?
How can we improve the quality of CPD fitting (e.g. by different timewarp methods in export, preprocessing of traces, etc.)?
What features are left out in the CPD fit, and are those are important or irrelevant aspects of the data?
How do the answers to the above questions compare to those of other dimensionality reduction methods (i.e. unwrap-NMF-rewrap)?
Perform a sanity check of the "correctness" of CPD output. So far, this means: "If we interpret a particular CPD factor to correspond to a task variable (based on its trial component), is that interpretation supported by the individual neurons that make up that factor?"

This indicates to me that we should sync up on normalization. Here is the method that I'm currently using

I like the workflow of running normalize_tensor on X prior to the CPD fit. I assume that #230 implements your suggested normalization method of #228. What do you think of the name "minmax" for that normalization method? Is #230 ready to merge?

If ready, I think we should merge #230 and apply the normalization prior to experimenting with higher rank fits. I am currently unfairly punishing the non-negative fits by providing values that it cannot model...

In general, it is going to be hard for tensor decomp (and PCA as well I think) to model these fast changes.

Some thoughts. Not saying we should implement the suggestion below, but just to throw the idea out...

I'm attracted to the idea of fitting to the Ca2+ trace (as opposed to, say, event trains) because I believe that there are neurons in our dataset where the specifics of the Ca2+ waveform are significant. One particular example that I have in mind is the "confidence" cell whose Ca2+ transient amplitude possibly signals the animal's expectation of reward. Thus, it is of interest to me whether CPD fits are able to accurately model the Ca2+ transient waveforms of prefrontal neurons.

So far, it seems to me that CPD can capture transients if the firing is highly consistent between trials, but it cannot when the transient timings are less regular. (Not surprising.)

For example, the highly consistent Cell 467: c11m1d15_cell467

has a relatively high Rsq=0.8199 and a decent fit of the Ca2+ waveform: c11m1d15_cell467_fit

Whereas the less consistent Cell 60 (which nonetheless seems to have a task coding for West-start and Error): c11m1d15_cell60

has a low Rsq=0.3451 and a poor fit to the actual waveform (trials sorted by residual, in the plot below): c11m1d15_cell60_fit

If the CPD has a known inability to model Ca2+ waveforms, it might make sense for us to explicitly preprocess the Ca2+ data in the intra-trial time dimension.

For example, we could split up intra-trial time into distinct phases, such as: "pre-gate-open", "run", "post-gate-close". For each of these intra-trial phases, we could average the fluorescence value, and hence turn the intra-trial samples (i.e. the 2nd dimension of X) into a feature vector. Note that by integrating intra-trial time via preprocessing, we are explicitly asking CPD to not model waveforms. However, as this approach still retain fluorescence values, we would have a chance to model the amplitude variations. (On the other hand, we would lose precise transient timings; but CPD has limited capability for modeling that when provided with the intra-trial samples anyway.)

Do you have any speculations or thoughts about the idea of preprocessing the intra-trial time dimension of X prior to CPD fitting?

By the way, Matlab's nnmf will throw an exception if the matrix to be factorized has negative values. So we definitely should sync up on the normalization first.

Wrote this quickly - so hopefully this addresses your notes. Let me know if I missed anything - I mostly focused on the cell 60 example.

I assume that #230 implements your suggested normalization method of #228.

Yep.

Is #230 ready to merge?

Yes - though at the moment it adds a few extraneous functions that didn't really pan out. Namely, the cprand function - I recommend that you keep using cp_als (which fit_cpd uses by default). I can delete this if you want - but it works as intended (it just isn't as reliable as I'd like) and merging it in won't break anything.

If the CPD has a known inability to model Ca2+ waveforms, it might make sense for us to explicitly preprocess the Ca2+ data in the intra-trial time dimension.

There is no inherent inability - we're just asking it to do quite a lot. For example, if we had a 100 x 100 x 100 tensor a rank-10 model only has 300 parameters to work with to fit 10 million datapoints.

The question is how much it bothers you to have errors like we see in Cell 60 - I'm only slightly bothered. The model seems to be activating at the appropriate times, but this cell just has a funky waveform that is quite different from other neurons.

By the way - you could do PCA on the single neuron raster for Cell 60 that you plotted above. I bet that it takes several components to capture that single neuron.

Do you have any speculations or thoughts about the idea of preprocessing the intra-trial time dimension of X prior to CPD fitting?

We could try -

Low-pass filtering or smoothing the fluorescence traces before fitting CPD
Extending the CP model to have different numbers of factors along each mode. The idea being that we could add more basis functions for the within-trial dynamics. This is a future direction I would like to push in, but it would be a novel method and thus a new/separate paper for a more specialized journal.

Not sure how high priority we should make this.

[Can we improve fits by adding...] different timewarp methods in export

This is my other project in the lab with Ben and Niru. Let me know if you want to follow up on this. We have code that actually works now and we could apply it to this data if you want. Again, this is a separate paper because the statistical methodology is novel.

There is no inherent inability - we're just asking it to do quite a lot.

I agree. Your argument based on the number of parameters / data points is a nice illustration. It would be quite magical if a rank ~10 CPD model could fit all the wiggles in the raw Ca2+ traces.

To put my original suggestion in another way: given that it's unreasonable to expect a few-rank CPD model to fit all the wiggles of the raw Ca2+ data, it may help to reduce the potential complexity of the intra-trial time dimension of X via preprocessing.

The question is how much it bothers you to have errors like we see in Cell 60 - I'm only slightly bothered. The model seems to be activating at the appropriate times, but this cell just has a funky waveform that is quite different from other neurons.

I agree with this also. (I'm actually not terribly bothered either.) Even when the CPD fit doesn't match the actual wiggles, it often tends to "activate" at the right time. Perhaps it suggests an alternative metric for assessing the quality of fit, e.g. a rank correlation rather than straight-up R squared.

On the other hand, I'd note that the waveform of Cell 60 isn't "funky". It's actually a pretty good example showing Ca2+ exponential transients.

Not sure how high priority we should make this.

I didn't mean to suggest that we actually work on alternate preprocessing for intra-trial time. Was just a food for thought.

This is my other project in the lab with Ben and Niru. Let me know if you want to follow up on this.

I'll skim the abstract you sent me, but I probably would punt it for now.

kimtonyhyun / analysis

CPD: Comparison of non-negative and unconstrained fits #232