Closed kimtonyhyun closed 4 years ago
The sort_scores
function is a convenience method that sorts the neuron indices in order of fit quality (best to worst).
For example, to sort neurons by their R squared score:
>> scores(1)
ans =
name: 'Rsq'
sort_order: 'descend'
vals: [487x150 double]
vals_n: [487x1 double]
>> sorted_scores = sort_score(scores(1));
yields a [num_neurons x 2]
matrix sorted_scores
that indicates the neuron index and the corresponding R squared values, ordered from best to worst:
sorted_scores =
69.0000 0.9030
361.0000 0.9024
360.0000 0.9017
486.0000 0.8942
441.0000 0.8848
...
371.0000 0.0686
58.0000 0.0678
71.0000 0.0572
11.0000 0.0389
36.0000 0.0318
I found it convenient to have the sorted_scores
displayed in a Matlab Variables window on the side, while I inspect cells in eval_fit
.
@ahwillia I created a helper function group_fits
which groups cells based on their fit score to one of: "good", "okay", "bad".
With X
and Xest
loaded into the workspace:
>> scores = compute_all_scores(X, Xest);
>> [good, okay, bad] = group_fits(scores(1));
yields:
good
: IDs of cells with R squared value in [0.7, 1];okay
: IDs of cells with R squared value in [0.4, 0.7);bad
: IDs of cells with R squared value in [0, 0.4).The thresholds of 0.4 and 0.7 are hard-coded in group_fits
. These values are just a starting point, and certainly aren't set in stone.
Accordingly, the eval_fit
script can be invoked with optional varargin'cells'
to browse a subset of the cells. For example, to evaluate the "good" cells only:
>> eval_fit(X, Xest, scores, 'cells', good);
Evaluator (Cell 4, Rsq=0.7082) >>
Aside: You may not be well are of the browse_rasters
function for inspecting single-cell rasters of a given DaySummary
, but it also has a similar 'cells'
optional input:
>> browse_rasters(m1d15, 'cells', good);
Raster browser (page 1 of 10) >>
Added a new score, called "Signed diff", which is essentially sum(X-Xest,2)
. (Recall that 2nd dimension is the intra-trial time.) Note that unlike the residual, this score is signed. In particular:
Xest
) underestimates the data (X
).The "Signed diff" score allows me to look for "spurious activity" in the fitting:
>> scores = compute_all_scores(X, Xest);
>> eval_fit(X, Xest, scores);
Evaluator (Cell 1, Rsq=0.6333) >> 209
Evaluator (Cell 209, Rsq=0.5398) >> s4
Evaluator (Cell 209, Signed diff=-2783.6319) >>
yields the following plot: where the green traces are the trials for which the fit underestimates the data, and red traces are those where the fit "overestimates" the data.
@ahwillia I added a small function fit_nnmf.m
which:
X
into a 2D matrix (call it Xu
),Xu
via nnmf
into rank-r
terms,Xest
based on the NMF factorization.I highlight the function here, so that you can evaluate its correctness. I'm writing an initial comparison to the CPD approach is coming as a separate issue.
I find the neural unfolding implemented here to be the most intuitive to me personally (i.e. just "unwrapping" and "rewrapping" in time). I made this visualization to help me get a feel for the data layout for all six possible unwrappings:
@ahwillia Updated fit_nnmf.m
so that it supports unfoldings along neurons (unfold_dim=1
) or trials (unfold_dim=3
).
Note: I use the term "fast" and "slow" dimensions to describe the ordering of the data points along the rows of the unfolded matrix Xu
. In my nomenclature, the "fast" dimension is the dimension that changes "more quickly" as you read the entries of Xu(k,:)
from left-to-right.
For example, in both of the unfolded matrices below, the fast dimension is the intra-trial time:
Another metric that we may consider comparing between NMF and CPD is the similarity of the factorization results for multiple instantiations of the fit. I just ran NMF factorization on the neuron unfolding (to make sure I didn't break the code), and it seemed that the set of factors seemed somewhat "different" than the first set that I analyzed in #235.
Here is a quick visualization of the trial loadings obtained by NMF on trial-unfolded X
. I used rank=15 as usual, and the green/red color code on the far right corresponds to trial_meta.correct
:
Based on this raster, it seemed to me that the k=7 factor may correspond to trial correctness. I then looked at the trial loading values for this factor, where the markers are color coded by correctness:
Some observations:
@ahwillia cc @forea Please review but do not merge.
As we've discussed in the past, I am interested in inspecting and evaluating the quality of dimensionality reduction results at the single neuron level. This PR introduces a new script -- called
eval_fit
-- that can be used for this purpose. (You will have to add the top-level directoryeval_fit
to the Matlab path.) While the immediate interest is in evaluating the quality of CPD factorization models, I designed the tool to be generic. Using this tool, we could for instance also investigate the fidelity of other dimensionality reduction methods such as PCA.Here is a walkthrough on its use, using CPD factorization of c11m1d15 as an example. I used rank=15, non-negative CPD fitting. I uploaded some pre-computed MAT files, so that you can follow along with this tutorial without running the trace export from
MultiDay
and CPD fitting from scratch:export-align.mat
(60 MB): This MAT file contains the raw data tensorX
i.e. the fluorescence traces organized as[num_neurons x intra-trial samples x num_trials]
. Also includesinfo
andtrial_meta
(i.e. the auxiliary outputs ofMultiDay.export
).export-align_cpd_rank15.mat
(55 MB): This MAT file contains the "fitted" / reconstructed data tensorXest
(with same dimensions asX
). Also includes thecpd
struct from the fit.Please download both files and load to the Matlab workspace to proceed with the walkthrough.
The basic idea of
eval_fit
is to perform comparisons between the tensorsX
andXest
. Any dimensionality reduction method (e.g. CPD, PCA) can be expanded into the full neural space (i.e. expand toXest
) soeval_fit
can generically be used to inspect the fidelity of different dimensionality reduction methods.To use
eval_fit
, we begin by pre-computing "fit quality scores" betweenX
andXest
:Note that
scores
is a struct array of length 3. Initially, I used the R squared as the measure of fit between the traces ofX
andXest
. However, it was apparent to me that other metrics (such as residuals) were also interesting, and could capture different aspects of the comparison betweenX
andXest
. Hence, thescores
array currently implements 3 different measures of fit quality.For example,
scores(1)
is the R-squared measure between the traces ofX
andXest
:The fields of
score
are as follows:score.name
is a string that indicates the name of the fit measure.score.vals
is a[num_neurons x num_trials]
matrix, wherescore.vals(i,j)
is the R squared value between the real and fitted traces of neuron i, trial j.score.vals_n
is a[num_neurons x 1]
matrix, wherescore.vals_n(i)
is the R squared value between the real and fitted traces of neuron i over all trials.score.sort_order
is one of'descend'
or'ascend'
, indicating the numerical sort direction to order the fit measure from best to worst. (For R squared, it is'descend'
sincesort(..., 'descend')
would order the R squared values from best to worst.)Similarly,
scores(2)
is the residual between traces ofX
andXest
:Note that the
sort_order
for residuals is'ascend'
since smaller values of the residual indicate a better fit.Finally,
scores(3)
is simply the peak-to-peak amplitude of the raw traces (i.e. inX
):The "Peak-to-peak score" isn't a measure of fit between
X
andXest
at all, but I still found it convenient to have this "score" during the use ofeval_fit
.After pre-computing the
scores
array, you can invoke theeval_fit
script as follows:which shows the following plot:
The interpretation of the plot is as follows:
You can jump to another cell by typing in the cell number:
which updates the plot accordingly:
You can also increment the cell index by 1, by just pressing Enter in the prompt:
which shows:
Note that the currently active measure of fit is
'Rsq
', as indicated in the command prompt. Type's'
to see the list of all available scores:You can switch into a different score type (for example, the residual) as follows:
Note that the trials that are best or worst under one measure is not necessarily the best or worst in another measure. In particular, the residual measure is effective at highlighting trials that have a large absolute discrepancy between raw and fitted trials.
The Peak-to-peak "score" allows me to look at the quality of the fit for trials where there is a large / small trace amplitude:
While "Peak-to-peak" isn't really a measure of fit between
X
andXest
, it allows me to quickly inspect that the fits are valid for trials where the actual trace is particularly active or silent. (For instance, it could be the case that a fit predicts a constant pattern for all trials, even when the raw data is silent. I would consider this to be a fairly offensive case of the fitting model "making up features" not present in the data.)You can exit the interaction loop by typing
'q'
: