Closed JobLeonard closed 7 years ago
Mental note: genes = rows, cells = columns
But as soon as you start combining loom files, those fields are no longer common to all cells. There is a mechanism for file-wide attributes (that's how the titles and descriptions work) but those are lost when files are combined.
Sten
Skickat från min iPhone
14 okt. 2016 kl. 10:15 skrev Job van der Zwan notifications@github.com<mailto:notifications@github.com>:
Another request by Amit.
We might want to consider adding a table that allows people to add "applies to all cells/genes" metadata - if something holds true for all cells or genes it's kind of silly to spend a whole row of column/row attibutes to it, right? Because this is exactly what's happening right now in many of the loom files, as far as I can see, and from what I understand it's quite a hassle to create too (messing around with excell and dragging fields and such).
That's more of an issue on the generation of loom files, but an important one I think!
Until then, I think we can easily extract metadata: basically, for strings we apply the same method as we do for the categories legend. Just iterate over all rows/columns, show the 20 most common ones. For numerical values, I suggest building a histogram. I have some ideas with how to approach this.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/linnarsson-lab/Loom/issues/65, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKKag5JaESazklQTfbXZgXDwSHfFVWKMks5qz0gagaJpZM4KWzLY.
Ah right. Well, aggregation is the best option then!
Stuff is starting to happen:
Just as a placeholder for the histogram I put in the sparklines. I think it might actually be more useful! For example, look at how you immediately see the correlation between zero values for _LogCV
, PC1
, PC2
, _tSNE1
and _tSNE2
.
ATTRIBUTE
should sort all genes or cells by that attribute (stable sort!)WIP update: filtering out by metadata-value almost works. See video:
https://www.youtube.com/watch?v=NejYB5Rr2Bc
The only thing that still goes wrong is that the colouring of the graph is done individually from the table, resulting in the sparkline colours going wrong when a value is filtered. This will require some quite deep re-plumbing in the infrastructure.
If you can look past that bug, the video below shows off the benefits of this: I can, for example, click on "oligodendrocytes" to filter them out of the shown data set. In the last dataset one particular group outnumbers the rest so much (ten times more) that it hides all the other results, so filtering it out reveals information about the loom file.
Filters can also be combined, btw.
I'll need some time to figure out how to fix that colouring issue (the hard part is doing so without the resulting code being a disgusting, hacky mess that implodes the next time it has to be updated). Once that is fixed, I'll add the search and sorting options that I also made for the dataset overview list (Amit requested this).
My idea is to eventually integrate the overlapping view settings between the various pages. So if you filter out the oligos on the cell metadata page, for example, they would also be filtered also not shown on the cell scatterplot or the sparklines view (and obviously these filter settings will be explicitly shown in the side-panel of both, with the option to turn it off again).
Great! That will be very useful!
/Sten
Sten Linnarsson, PhD Professor of Molecular Systems Biology Karolinska Institutet Unit of Molecular Neurobiology Department of Medical Biochemistry and Biophysics Scheeles väg 1, 171 77 Stockholm, Sweden +46 8 52 48 75 77 (office) +46 70 399 32 06 (mobile)
On 25 Oct 2016, at 18:56, Job van der Zwan notifications@github.com<mailto:notifications@github.com> wrote:
[image]https://cloud.githubusercontent.com/assets/259840/19695686/6fa2db14-9ae4-11e6-9d63-0726dec348a5.png
WIP update: filtering out by metadata-value almost works. See video:
https://www.youtube.com/watch?v=NejYB5Rr2Bc
The only thing that still goes wrong is that the colouring of the graph is done individually from the table, resulting in the sparkline colours going wrong when a value is filtered. This will require some quite deep re-plumbing in the infrastructure.
If you can look past that bug, the video below shows off the benefits of this: I can, for example, click on "oligodendrocytes" to filter them out of the shown data set. In the last dataset one particular group outnumbers the rest so much (ten times more) that it hides all the other results, so filtering it out reveals information about the loom file.
Filters can also be combined, btw.
I'll need some time to figure out how to fix that colouring issue (the hard part is doing so without the resulting code being a disgusting, hacky mess that implodes the next time it has to be updated). Once that is fixed, I'll add the search and sorting options that I also made for the dataset overview list (Amit requested this).
My idea is to eventually integrate the overlapping view settings between the various pages. So if you filter out the oligos on the cell metadata page, for example, they would also be filtered also not shown on the cell scatterplot or the sparklines view (and obviously these filter settings will be explicitly shown in the side-panel of both, with the option to turn it off again).
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/linnarsson-lab/Loom/issues/65#issuecomment-256094697, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKKag0s1rm75xMJxlH2s3Oq9F5Eu73uSks5q3jS_gaJpZM4KWzLY.
Implemented in the latest version
Another request by Amit.
We might want to consider adding a table that allows people to add "applies to all cells/genes" metadata - if something holds true for all cells or genes it's kind of silly to spend a whole row of column/row attibutes to it, right? Because this is exactly what's happening right now in many of the loom files, as far as I can see, and from what I understand it's quite a hassle to create too (messing around with excell and dragging fields and such).
That's more of an issue on the generation of loom files, but an important one I think!
Until then, I think we can easily extract metadata: basically, for strings we apply the same method as we do for the categories legend. Just iterate over all rows/columns, show the 20 most common ones. For numerical values, I suggest building a histogram. I have some ideas with how to approach this.