Closed kimtonyhyun closed 7 years ago
The first statistics that I looked at was the distribution of single-cell R squared values (i.e. between X(i,:,:)
and Xest(i,:,:)
) under the two dimensionality reduction methods. The two fits have the same nominal rank, but the number of numerical degree of freedom is significantly larger for the NMF approach. We thus expected that the NMF approach was going to dramatically going to outperform the CPD fit when evaluated on the basis of single-cell R squared values. I found that this wasn't the case.
Plotted below is the distribution of the R squared values for the two fitting methods:
It is clear that, on the whole, the CPD distribution fit the single neuron traces better than the NMF approach. An interesting aside: the cell with the maximal differential (CPD's Rsq=0.8309 vs NMF's Rsq=0.3803) is the notorious Cell 209, which was discussed in #232.
Aside: The above comparison considers the single-cell Rsq values as a figure of merit for the fit. On the other hand, as previously noted, neither CPD nor NMF uses the single-cell Rsq values as the objective function, but rather "normalized error" between X
and Xest
.
@ahwillia Can you point me to the part of the CPD code where you compute the normalized error (e.g. for final output formatting of fit_cpd
)? I could factor that out into a separate function, and then evaluate the difference in normalized error for the CPD and NMF Xest
s.
Another intuition that we had with regards to the NMF fit was that the additional numerical degrees of freedom would help it "capture" isolated Ca2+ transients. To the contrary, I found that the NMF fit misses this type of activity just as frequently (if not more) than the CPD:
Note that I used Rsq = 0.3 as a threshold for "predicted single cell traces do not model data", based on previous experience. Here are some example traces at such Rsq values, taken from the NMF fit.
NMF fit, Rsq=0.3022:
NMF fit, Rsq=0.2017:
NMF fit, Rsq=0.1025:
There does exist, however, examples where the NMF fit manages to capture the "wiggles" in individual traces better than the CPD fit. This is what we had intuitively expected. For example, consider Cell 314.
Here is the NMF fit (Rsq=0.9240), note how the fitted trace captures multiple transients within a single trial:
Here is the CPD counterpart (Rsq=0.8674), which seems to model only the strongest transient at the middle of the trial:
I then evaluated the "time-trial" factors obtained via NMF. Recall that I used rank=15 NMF factorization for this discussion.
Here is a visualization of one of the NMF factors (k=1):
In principle, the NMF factorization on the neuron unfolding of X
does not have a hard constraint based on the trial structure. So, there's no hard requirement that its factor traces show trial regularity (and therein lies the additional degrees of freedom for the fit). However, the trial structure is clearly evident in the above NMF factor.
For instance, it seems very plausible to further factor this "time-trial" vector from NMF into a product of time and trial sub-factors. Though, admittedly, this "successive factoring" approach for factoring the tensor X
is not as principled as the CPD approach.
Here are additional NMF factors:
You can see all 15 NMF factors here. As a rule, I found that all NMF temporal traces show trial-based regularity.
@ahwillia My initial conclusions on NMF on neuron unfoldings of X
is that it looks very much like the results of CPD factorization.
Specifically, while the NMF approach does not "hard-code" the trial structure of the prefrontal dataset into the data tensor, the factorization "recovers" the trial regularity regardless.
Equivalently, the NMF factorization does not make use of the degrees of freedom it has along the temporal dimension (and hence many of our expectations were incorrect).
Any thoughts and comments so far?
Can you point me to the part of the CPD code where you compute the normalized error (e.g. for final output formatting of fit_cpd)?
https://github.com/schnitzer-lab/analysis/blob/master/tensor/fit_cpd.m#L50
I'm very interested to compare NMF to CPD in terms of reconstruction error. If they have the same number of components NMF should beat CPD, since (non-negative) CP decomposition is one solution to NMF. You should be able to convert the CP decomposition factors into an NMF model:
A = % neuron factors (n_neurons x rank)
B = % time factors (n_time x rank)
C = % trial factors (n_trials x rank)
X_unf = % neurons unfolding of tensor
BC = khatrirao(B, C) % the "khatri-rao product" (available in tensor toolbox)
X_est = A * transpose(BC) % this is a valid solution to NMF
err = norm(X_unf - X_est, 'fro') / norm(X_unf, 'fro')
So the only way NMF should lose to CPD in terms of reconstruction error is if it gets caught in a local min during optimization.
You can read more on the Khatri-Rao product in Tammy's review: http://www.sandia.gov/~tgkolda/pubs/pubfiles/TensorReview.pdf
I found that all NMF temporal traces show trial-based regularity.
Yep, I'd expect this.
NMF on neuron unfoldings of X is that it looks very much like the results of CPD factorization.
I think the reconstructions of single cells should look quite similar. It is encouraging that it doesn't seem to be outperforming CPD.
The question is whether it is easy to identify neural populations and dynamics that are associated with error-coding, are sensitive to location, etc. I think you'll find it is more difficult to do so with the unfoldings... You might try doing NMF on the trials unfolding?
The question is whether it is easy to identify neurons that are error-coding, sensitive to location, etc. with this technique. I think you'll find it is more difficult to do so... You might try doing NMF on the trials unfolding?
My approach to finding the functional groups would be to use the factors from the neural unfolding. For example, in the examples shown above:
I'll explore that and report here.
I think there is an interesting conceptual distinction between the approaches that you and I laid out for identifying functional groups.
In my approach, I am essentially approximating a CPD decomposition of the data tensor, by taking successive NMFs: i.e. firstly, in the [neuron x time-trial]
matrix; then, secondly, in the [time x trial]
sub-matrix. The first NMF step doesn't necessarily have to produce time-trial factors that allow for this successive approach but, empirically, this seems possible to perform with the prefrontal dataset.
You, on the other hand, are suggesting an explicit NMF factorization on a trial unfolding, i.e. one of these two unfolded matrices: to see if the trial vectors can be readily interpreted with experimental metadata. I agree that this is an important comparison case between NMF and CPD, as this is "more explicitly NMF" than my "approximate CPD with multiple NMFs" approach.
I am essentially approximating a CPD decomposition of the data tensor, by taking successive NMFs: i.e. firstly, in the [neuron x time-trial] matrix; then, secondly, in the [time x trial] sub-matrix.
I believe that there are papers that explore this idea as an optimization technique for CP decomposition. I'll ask Tammy and do a literature search.
I think the reconstructions of single cells should look quite similar. It is encouraging that it doesn't seem to be outperforming CPD.
Yes, at the single-neuron trace level, I'd expect both CPD and NMF reconstructions to look similar (although I expected the NMF approach to perform better according to this metric).
What I had meant by saying "it looks very much like the results of CPD factorization", was that the NMF factors in the time-trial dimension looked comparable to their counterparts in CPD.
Specifically, the NMF factors in the time-trial dimension have strong regularity along trials -- even though this wasn't a hard constraint -- and thus would be conducive to further factorization between time and trial dimensions, yielding something that might look very similar to the CPD factors.
When you wrote:
err = norm(X_unf - X_est, 'fro') / norm(X_unf - X_est, 'fro')
I assume you meant:
err = norm(X_unf - X_est, 'fro') / norm(X_unf, 'fro')
The reconstruction errors are as follows:
err = 0.4356
err = 0.4065
The numerical values are similar, but CPD error is lower than that of NMF! I agree that local minima during optimization is a probable culprit. (There is also a num_replicates
option in Matlab's nnmf
, we might make use of that...)
Note: for CPD, I used Xest = full(cpd); Xest = Xest.data;
per the tensor docs, rather than expanding the individual factors via the KR product per your comment. I don't know how to obtain A
, B
, and C
from the new output format of fit_cpd
.
I believe that there are papers that explore this idea as an optimization technique for CP decomposition. I'll ask Tammy and do a literature search.
Going forward, should I post new issue threads in the cpd-paper
repository so that all authors have visibility into our discussions on CPD?
Here is my effort on identifying functional groups, based on examining the time-trial vectors in the neural unfolding. (As discussed previously, I think this approach is approximating a CPD decomposition of the data tensor by successive NMF factorization.)
To begin, I "wrapped" the time-trial vector into trials, by using the the a priori known periodicity. I showed the resulting matrices ([num_trials x num_samples_in_trial]
) previously, which showed strong repeatability in trials.
I now took that time-trial matrix, and sorted the trials based on various trial metadata. In the figure below:
cc @forea I have often produced these types of raster plots before, but they were previously always at the single neuron level. The plots that I present here, on the other hand, represent the collective dynamics of sub-populations of neurons as identified by the NMF factorization.
The above example factor (k=2) shows a strong correspondence to correctness of the trial (or reward; the fine distinctions will have to come later).
Similarly, I found that these other NMF factors showed clear correspondence to other task variables:
k=2
: Correctness/Rewardk=3
: West startk=4
: East startk=11
: Errork=12
: South endInterestingly, I did not find an NMF factor that corresponded to North end, despite the fact that:
Here is another example of the NMF raster plot. I show the "Error" factor (k=11):
You can view the "sliced" raster plots of all 15 NMF factors here.
Going forward, should I post new issue threads in the cpd-paper repository so that all authors have visibility into our discussions on CPD?
No I think here is fine.
I assume you meant:
err = norm(X_unf - X_est, 'fro') / norm(X_unf, 'fro')
Yes 😓
I don't know how to obtain A, B, and C from the new output format of fit_cpd.
Something along these lines should work
% choose replicate i of the rank-r models
X = % data tensor
all_models = fit_cpd(X, ...);
model = all_models(i, r);
lam = nthroot(model.decomp.lambda, 3);
% Note - lam should be a row vector I think? If not, transpose it.
A = model.decomp.u{1} .* repmat(lam, size(X,1), 1);
B = model.decomp.u{2} .* repmat(lam, size(X,2), 1);
C = model.decomp.u{3} .* repmat(lam, size(X,3), 1);
% Then you can make the NMF as I described before
BC = khatrirao(B, C) % the "khatri-rao product" (available in tensor toolbox)
X_est = A * transpose(BC) % this is a valid solution to NMF
The lam
variable contains the higher-order analogs of singular values you mentioned earlier.
I then looked at the neural loadings of some of these NMF factors. Here, I take the "Error" (k=11) factor as an example.
Here is the distribution of values in the neural vector associated with factor k=11:
(Based on the right panel, it seems to me that there is a "transition point" around sorted rank ~70 or so.)
I then evaluated the single cells that have high amplitude in this NMF factor. For example, here is Cell 472 which has the highest amplitude of 16.3546 in the neural vector:
Another example, Cell 288 which has amplitude 11.0110 (sorted rank 10):
You can view the individual cell rasters, sorted by their amplitude in the NMF vector here. I plotted cells with sorted ranks 1, 2, 3 ..., 9, 10, 20, 30, ..., 90, 99. Look at the file name for the sorting rank. I would say that the top 10 cells are very highly correlated with incorrect trials. I would draw a threshold around sorted rank ~40.
For fun, I plotted the top 40 "error cells" as identified by NMF as a cell map (error cells shown in red; all others green):
The question is whether it is easy to identify neural populations and dynamics that are associated with error-coding, are sensitive to location, etc.
To conclude, I would say that it was pretty straightforward to use NMF factorization on neuron unfoldings to identify the error population (at least in this example case). It's not clear to me yet how we are doing in terms of "completeness" and other details.
Something along these lines should work
Does your method of generating the estimate of the neuron unfolding of Xest
generate different results than if I just perform the neuron unfolding of the Xest = full(cpd)
tensor?
It should do the same thing. I mostly suggest it as a sanity check.
On Jan 21, 2017 11:11 AM, "Tony" notifications@github.com wrote:
Something along these lines should work
Does your method of generating the estimate of the neuron unfolding of Xest generate different results than if I just perform the neuron unfolding of the Xest = full(cpd) tensor?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/schnitzer-lab/analysis/issues/235#issuecomment-274281592, or mute the thread https://github.com/notifications/unsubscribe-auth/AAm20Rk7BwKJBQxvwEQa9qJGmBPLv68eks5rUlhUgaJpZM4LpiwK .
I performed a similar analysis of the NMF factor k=3 (West representation).
Here is a distribution of the values in the neural vector (k=3):
Here is the top rank Cell 137 (val=28.3601):
Number 2: Cell 253 (val=27.5579):
Number 3: Cell 401 (val=26.0921):
Number 10: Cell 385 (val=19.4015), admittedly not as clean:
Number 20: Cell 456 (val=16.6165):
Cell 456 above seems to me more of a type that always fires at the beginning of a trial, regardless of West/East start location. As such, I wonder if this cell is being represented in the NMF factorization as the superposition of "East" and "West" factors.
This, in turn, suggests to me to consider a "factor-selectivity index" for a cell for the purposes of functional categorization, rather than considering only the absolute values in the neural vector (as I've done here).
Here is a cell map of the top 20 "West cells" selected by NMF factor k=3:
And both West and Error types:
Closing since the CPD model here had rank=29, rather than the assumed 15.
Note: It turned out that the CPD fitting described in this issue was rank=29, rather than 15. Hence the CPD and NMF models were not nominally "rank matched". So, some of the conclusions that were made in this issue are false.
@ahwillia Here is my initial report on running NMF on unfoldings of the data tensor
X
. We have previously discussed this to be the "2D counterpart" to non-negative CPD factorization of the full tensorX
.I am still using a single session, namely c11m1d15. The exported data tensor has been normalized (all entries between 0 and 1) per
normalize_tensor
of #230. I re-ran non-negative CPD factorization on the normalized data tensor. I ran NNMF on the neuron-unfolding ofX
perfit_nnmf
of #231. The data can be obtained here:export-align-norm.mat
(59 MB): Contains the normalized data tensorX
.export-align-norm_nncpd-r15.mat
(60 MB): Contains the reconstructed tensorXest
obtained via rank 15 non-negative CPD factorization.export-align-norm_nmf-r15.mat
(60 MB): Contains the reconstructed tensorXest
obtained via rank 15 NNMF on the neuron unfolding ofX
.