kimtonyhyun / analysis

2 stars 0 forks source link

CPD: Mapping non-negative CPD factors to single neurons #238

Closed kimtonyhyun closed 5 years ago

kimtonyhyun commented 7 years ago

@ahwillia I recomputed the non-negative CPD factorization of c11m1d15 after #237. The new (actually rank=15) fit can be accessed here: export-align-norm_nncpd-r15_run01.mat.

The visualize_neuron_ktensor plot for this factorization is as follows (using "correct" coloring): visneuronktensor

kimtonyhyun commented 7 years ago

At this point, I have fairly high confidence that the CPD factors will correctly map to single neurons. Nevertheless, actual exploration is warranted!

To start, I looked at the "integrating error factor" (k=10) in the original post.

(It is interesting to note that factors k=6 and k=12 in this fit also correlate with error. Their trial vectors, on the other hand, do not seem to integrate. It will be interesting to see if the distinct "types" of error factors continue to emerge in repeated runs of the CPD fit – especially if we vary/reduce the model rank.)

Here is the trial loading for the integrating error factor, color-coded by trial correctness: cpd_k10

A closer look at the neural loadings: cpd_k10_neural

Here are the top three individual cells (Cells 472, 479, 193) that comprise the error factor: cpd_k10_top1 cpd_k10_top2 cpd_k10_top3

As well as the 10-th ranked cell (Cell 162): cpd_k10_top10

Here is the "automatically generated" cell map highlighting error cells (top 30 cells that belong to the error factor, k=10): cpd_k10_cellmap

kimtonyhyun commented 7 years ago

Here is another example of mapping the CPD "error factor" onto single-cells. This time, I am using the c11m1d13 dataset and its r-15 CPD model. Trials colored by correctness: visktensor-correct

On this dataset / CPD run, there is only one factor that corresponds to error: k=11. Here is a look at that factor's neural vector: c11m1d13_cpdfit_k11

Here is the top-ranked cell in the error factor (Cell 484; neural val=0.1498). It is a wonderful error-representing cell: c11m1d13_cpdfit_k11_sr01

Here is the second-highest ranked cell (Cell 225; val=0.1445): c11m1d13_cpdfit_k11_sr02

I can see how the above cell could have a high amplitude in the "error factor" (k=11), but clearly that would not be my interpretation of this cell looking at its raster.

Here is the third-highest ranked cell (Cell 100; val=0.1366). Once again, pretty good: c11m1d13_cpdfit_k11_sr03

Here is another illustrative result. This is the 8-th ranked cell (Cell 428; val=0.1220): c11m1d13_cpdfit_k11_sr08

The above cell definitely has correspondence to error trials, but note that it could further be characterized as a mixed-selectivty cell.

So, my current opinion of using the CPD to identify functional subpopulations in an unsupervised way is that:

ahwillia commented 7 years ago

We also ought to develop an index for how exclusively a cell draws from a particular factor, as discussed before.

Something like this maybe:

neuron_factors = models(s, r).decomp.u{1};
max(neuron_factors.^2, 2) ./ sum(neuron_factors.^2, 2)

That would be an interesting figure to make. I'm a bit tied up today but should be able to get some work done tonight.

On Sat, Jan 28, 2017 at 10:39 AM, Tony notifications@github.com wrote:

Here is another example of mapping the CPD "error factor" onto single-cells. This time, I am using the c11m1d13 dataset and its r-15 CPD model. Trials colored by correctness: [image: visktensor-correct] https://cloud.githubusercontent.com/assets/2081503/22398733/2d902106-e543-11e6-8b57-350c15629826.png

On this dataset / CPD run, there is only one factor that corresponds to error: k=11. Here is a look at that factor's neural vector: [image: c11m1d13_cpdfit_k11] https://cloud.githubusercontent.com/assets/2081503/22398782/e53f74e6-e543-11e6-9c86-33fbe4d2c421.png

Here is the top-ranked cell in the error factor (Cell 484; neural val=0.1498). It is a wonderful error-representing cell: [image: c11m1d13_cpdfit_k11_sr01] https://cloud.githubusercontent.com/assets/2081503/22398849/d7fe9dec-e544-11e6-93b2-e3c4676f66ef.png

Here is the second-highest ranked cell (Cell 225; val=0.1445): [image: c11m1d13_cpdfit_k11_sr02] https://cloud.githubusercontent.com/assets/2081503/22398855/eda788a2-e544-11e6-8077-24f391252cbc.png

I can see how the above cell could have a high amplitude in the "error factor" (k=11), but clearly that would not be my interpretation of this cell looking at its raster.

Here is the third-highest ranked cell (Cell 100; val=0.1366). Once again, pretty good: [image: c11m1d13_cpdfit_k11_sr03] https://cloud.githubusercontent.com/assets/2081503/22398863/2dfe0674-e545-11e6-80e4-515366f9ddb4.png

Here is another illustrative result. This is the 8-th ranked cell (Cell 428; val=0.1220): [image: c11m1d13_cpdfit_k11_sr08] https://cloud.githubusercontent.com/assets/2081503/22398882/97a2bf98-e545-11e6-95ef-36190f477eab.png

The above cell definitely has correspondence to error trials, but note that it could further be characterized as a mixed-selectivty cell.

So, my current opinion of using the CPD to identify functional subpopulations in an unsupervised way is that:

  • I have fairly high confidence that the approach is viable and can identify subpopulations in general.
  • However, I would continue performing sanity checks (i.e. raster plots) of the cells that make up any particular factor.
  • We also ought to develop an index for how exclusively a cell draws from a particular factor, as discussed before.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/schnitzer-lab/analysis/issues/238#issuecomment-275866037, or mute the thread https://github.com/notifications/unsubscribe-auth/AAm20SZDdloOtvskNgCb5dbn0X_LhfYXks5rW4thgaJpZM4LqMsk .