colah commented 5 years ago

🔬 This is an experiment in doing radically open research. I plan to post all my work on this openly as I do it, tracking it in this issue. I'd love for people to comment, or better yet collaborate! See more.

Please be respectful of the fact that this is unpublished research and that people involved in this are putting themselves in an unusually vulnerable position. Please treat it as you would unpublished work described in a seminar or by a colleague.

Description

People tend to treat dimensionality reduction as this kind of black box transformation. Take a representation, plug it in, outcomes the dimensionality reduction, use that for vis. They may fiddle a little bit with the choice of algorithm or hyperparamereters, but that tends to be it.

It seems to me that this is missing a huge fraction of the potential and richness of dimensionality reduction. It's a lot like when people take data vis to just be line plots.

Examples:

Alignment - Often we want to apply dimensionality reduction to multiple "analogous" datasets and produce dimensionality reductions that are aligned, with similar data occurring in the same place to the greatest extent possible. This is important for making them easy to visually compare. (For example: word embeddings trained on different corpuses, representations of a dataset in different models, etc.)
- Approaches: Alignment objectives, shared parametrization
Shaping - Often times, we'd ideally like the output of dimensionality reduction to have a certain two dimensional shape. For example, a rectangle. In other cases, we might not care about the exact shape, but we might want a certain aspect ratio (eg. to better display on a screen, or to facilitate small multiples). While it is possible to do this by deforming the output of the dimensionality reduction after the fact, doing so damages the information conveyed in the visualization. The right thing to do is to incorporate it into the actually dimensionality reduction process, so that you get the optimal visualization given your constraints.
- Approaches: Objectives pushing towards desired shape, constrained parameterizations limiting points to shape.
Axis Specialization - It can sometimes be powerful to use different dimensionality reduction techniques on different axes. For example, you may wish to display a word embedding projected onto man-woman on one axis, but do t-SNE on the orthogonal axis. Ideally, t-SNE should be "aware" of the previous axis, by actually doing the optimization in 2D with one dimension constrained.
Data Organization - Sometimes, dimesnionality reduction can be subtly used in data visualization, such as ordering an unordered set to make things more legible. One example of this is in Four Experiments in Handwriting with a Neural Network when the LSTM units are arranged by t-SNE.
Better optimization - Dimensionality reduction is generally an optimization problem, and for more complicated techniques like t-SNE it is probably the case -- even with lots of compute -- that we aren't getting to the optima. Better parameterization of our optimization problem is low-hanging fruit to try and improve this.

Reading

Differentiable Image Parameterizations is highly relevant.

Next Steps

Create a simple dimensionality reduction library using the objective / parametrization split from lucid/optvis.

ncammarata commented 5 years ago

This may already be what you meant by "shaping into rectangle" but it might be useful for the dimensionality reduction algorithm to have an idea of the shape of each point as well as the shape of the full container for visualizations like grid-style plots rather than doing it in post-processing.

colah commented 5 years ago

@ncammarata - Great point. I'd only been thinking about the container. I'll need to ponder that a bit.

ncammarata commented 5 years ago

Over the last few weeks I've run into a few different situations where this would have been helpful, so I'd like to add a small bump in how much I support this.

One example is I wanted to create a visualization is a 2d plot where I do a 1d dimensionality reduction to the circumference of a circle, then have another already-known 1d value be the radius from the center.

The problem with doing this now is I can't incentivize the 1d dimensionality reduction to consider the two ends to wrap, as they would in the case of the perimeter of a circle.

colah commented 5 years ago

@ncammarata This is a really interesting use case. Totally feasible and would be nice to have.

tensorflow / lucid

Research: "The Art of Dimensionality Reduction" #111

Description

Reading

Next Steps