Metadata summary view - Githubissues

JobLeonard commented 7 years ago

Another request by Amit.

We might want to consider adding a table that allows people to add "applies to all cells/genes" metadata - if something holds true for all cells or genes it's kind of silly to spend a whole row of column/row attibutes to it, right? Because this is exactly what's happening right now in many of the loom files, as far as I can see, and from what I understand it's quite a hassle to create too (messing around with excell and dragging fields and such).

That's more of an issue on the generation of loom files, but an important one I think!

Until then, I think we can easily extract metadata: basically, for strings we apply the same method as we do for the categories legend. Just iterate over all rows/columns, show the 20 most common ones. For numerical values, I suggest building a histogram. I have some ideas with how to approach this.

[x] "Gene metadata" and "Cell metadata" panels
- [x] search fields (like with the dataset list)
- [x] list of all row/column attributes
- [x] use schema for inferring data type
[x] for string metadata: list of twenty most common strings
- [x] essentially counting string occurrences and sorting end result; first sort key number of occurrences, second key alphabetical
[ ] for numerical metadata: histogram
- [ ] find max/min values and "resolution" (that is, count unique values while searching for min/max values, up to some maximum number of unique values, then use that as number of bins when building the histogram)
- [ ] linear/log scale toggle (it's hard to predict which of the two is more appropriate, so we'll let the user decide this)

JobLeonard commented 7 years ago

Mental note: genes = rows, cells = columns

slinnarsson commented 7 years ago

But as soon as you start combining loom files, those fields are no longer common to all cells. There is a mechanism for file-wide attributes (that's how the titles and descriptions work) but those are lost when files are combined.

Sten

Skickat från min iPhone

14 okt. 2016 kl. 10:15 skrev Job van der Zwan notifications@github.com<mailto:notifications@github.com>:

Another request by Amit.

We might want to consider adding a table that allows people to add "applies to all cells/genes" metadata - if something holds true for all cells or genes it's kind of silly to spend a whole row of column/row attibutes to it, right? Because this is exactly what's happening right now in many of the loom files, as far as I can see, and from what I understand it's quite a hassle to create too (messing around with excell and dragging fields and such).

That's more of an issue on the generation of loom files, but an important one I think!

Until then, I think we can easily extract metadata: basically, for strings we apply the same method as we do for the categories legend. Just iterate over all rows/columns, show the 20 most common ones. For numerical values, I suggest building a histogram. I have some ideas with how to approach this.

"Gene metadata" and "Cell metadata" panels
- search fields (like with the dataset list)
- list of all row/column attributes
for string metadata: list of twenty most common strings
- essentially counting string occurrences and sorting end result; first sort key number of occurrences, second key alphabetical
for numerical metadata: histogram
- find max/min values and "resolution" (that is, count unique values while searching for min/max values, up to some maximum number of unique values, then use that as number of bins when building the histogram)
- linear/log scale toggle (it's hard to predict which of the two is more appropriate, so we'll let the user decide this)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/linnarsson-lab/Loom/issues/65, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKKag5JaESazklQTfbXZgXDwSHfFVWKMks5qz0gagaJpZM4KWzLY.

JobLeonard commented 7 years ago

Ah right. Well, aggregation is the best option then!

JobLeonard commented 7 years ago

Stuff is starting to happen:

screenshot_20161017_155323

Just as a placeholder for the histogram I put in the sparklines. I think it might actually be more useful! For example, look at how you immediately see the correlation between zero values for _LogCV, PC1, PC2, _tSNE1 and _tSNE2.

[ ] clicking on an ATTRIBUTE should sort all genes or cells by that attribute (stable sort!)
[x] clicking on a plot should cycle between plot types (bar, categorical, heatmap)
[ ] string tables should have categorical plots when sensible (with current mini-table legend, coloring the cells to match the plot)

JobLeonard commented 7 years ago

WIP update: filtering out by metadata-value almost works. See video:

https://www.youtube.com/watch?v=NejYB5Rr2Bc

The only thing that still goes wrong is that the colouring of the graph is done individually from the table, resulting in the sparkline colours going wrong when a value is filtered. This will require some quite deep re-plumbing in the infrastructure.

If you can look past that bug, the video below shows off the benefits of this: I can, for example, click on "oligodendrocytes" to filter them out of the shown data set. In the last dataset one particular group outnumbers the rest so much (ten times more) that it hides all the other results, so filtering it out reveals information about the loom file.

Filters can also be combined, btw.

I'll need some time to figure out how to fix that colouring issue (the hard part is doing so without the resulting code being a disgusting, hacky mess that implodes the next time it has to be updated). Once that is fixed, I'll add the search and sorting options that I also made for the dataset overview list (Amit requested this).

My idea is to eventually integrate the overlapping view settings between the various pages. So if you filter out the oligos on the cell metadata page, for example, they would also be filtered also not shown on the cell scatterplot or the sparklines view (and obviously these filter settings will be explicitly shown in the side-panel of both, with the option to turn it off again).

slinnarsson commented 7 years ago

Great! That will be very useful!

/Sten

Sten Linnarsson, PhD Professor of Molecular Systems Biology Karolinska Institutet Unit of Molecular Neurobiology Department of Medical Biochemistry and Biophysics Scheeles väg 1, 171 77 Stockholm, Sweden +46 8 52 48 75 77 (office) +46 70 399 32 06 (mobile)

On 25 Oct 2016, at 18:56, Job van der Zwan notifications@github.com<mailto:notifications@github.com> wrote:

[image]https://cloud.githubusercontent.com/assets/259840/19695686/6fa2db14-9ae4-11e6-9d63-0726dec348a5.png

WIP update: filtering out by metadata-value almost works. See video:

https://www.youtube.com/watch?v=NejYB5Rr2Bc

The only thing that still goes wrong is that the colouring of the graph is done individually from the table, resulting in the sparkline colours going wrong when a value is filtered. This will require some quite deep re-plumbing in the infrastructure.

If you can look past that bug, the video below shows off the benefits of this: I can, for example, click on "oligodendrocytes" to filter them out of the shown data set. In the last dataset one particular group outnumbers the rest so much (ten times more) that it hides all the other results, so filtering it out reveals information about the loom file.

Filters can also be combined, btw.

I'll need some time to figure out how to fix that colouring issue (the hard part is doing so without the resulting code being a disgusting, hacky mess that implodes the next time it has to be updated). Once that is fixed, I'll add the search and sorting options that I also made for the dataset overview list (Amit requested this).

My idea is to eventually integrate the overlapping view settings between the various pages. So if you filter out the oligos on the cell metadata page, for example, they would also be filtered also not shown on the cell scatterplot or the sparklines view (and obviously these filter settings will be explicitly shown in the side-panel of both, with the option to turn it off again).

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/linnarsson-lab/Loom/issues/65#issuecomment-256094697, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKKag0s1rm75xMJxlH2s3Oq9F5Eu73uSks5q3jS_gaJpZM4KWzLY.

JobLeonard commented 7 years ago

Implemented in the latest version

linnarsson-lab / loom-viewer

Metadata summary view #65