linnarsson-lab / loom-viewer

Tool for sharing, browsing and visualizing single-cell data stored in the Loom file format
BSD 2-Clause "Simplified" License
35 stars 6 forks source link

Axis and other scatterplot labels #102

Open JobLeonard opened 7 years ago

JobLeonard commented 7 years ago

img_20170524_122438_dro-01

img_20170524_145004-01

We have a lot of different cases to consider:

JobLeonard commented 7 years ago

Marks for axes need to automatically scale for numerical data, and we need sensible defaults.

Base "formula"

I think this works as a starting point:

If multiple answers fit the above criterium, choose the one with he most ticks

Example:

min value: 0, max value: 8, delta: 8. correct answers include:

In this case, marks should be spread by one

Refinements

We might want to make "never less than 3, never more than 10" scale with available pixels (so for a big plot, allow for more ticks).

We should think of significant digits: if we only have the values 1, 2, 3 and 4 in our dataset, there's no point in showing ticks every 0.5 points.

We allow for log2 scaling. This might require some special logic, but perhaps a simple projection suffices.

slinnarsson commented 7 years ago

http://vis.stanford.edu/files/2010-TickLabels-InfoVis.pdf

-- Sten Linnarsson, PhD Professor of Molecular Systems Biology Karolinska Institutet Unit of Molecular Neurobiology Department of Medical Biochemistry and Biophysics Scheeles väg 1, 171 77 Stockholm, Sweden +46 8 52 48 75 77 (office) +46 70 399 32 06 (mobile)

On 24 May 2017, at 14:24, Job van der Zwan notifications@github.com wrote:

Marks for axes need to automatically scale for numerical data, and we need sensible defaults.

Base "formula"

I think this works as a starting point:

• spread ticks in multiples of of one of { 1, 2, 5 } * 10^n, where n is chosen such that total ticks is • never less than 3 (exception: less than three unique values), • never more than 10 If multiple answers fit the above criterium, choose the one with he most ticks

Example:

min value: 0, max value: 8, delta: 8. correct answers include:

• (1 10^0) = 1 (8 ticks) • 2 10^0 = 2 (4 ticks) In this case, marks should be spread by one

Refinements

We might want to make "never less than 3, never more than 10" scale with available pixels (so for a big plot, allow for more ticks).

We should think of significant digits: if we only have the values 1, 2, 3 and 4 in our dataset, there's no point in showing ticks every 0.5 points.

We allow for log2 scaling. This might require some special logic, but perhaps a simple projection suffices.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

JobLeonard commented 7 years ago

Aaah, thanks! I hadn't thought of looking for papers on the subject. Looks interesting, although like any good maths paper they introduce formulas without defining what the terms are. Oh well, nothing a little trial and error won't fix.

Moving orders of magnitude to the labels is a good idea, will be stealing that.

The article also mentions that they do not know of any research that has been done into what type of numbers are "nice". However, I know of at least one example.

See, the reason I went with 1, 2 and 5 was because I recall reading an article in the early 2000s about the research behind the creation of the Euro coin. Annoyingly, I can't find it now. Anyway, IIRC the choice to scale bank notes in this order had mathematical grounding:

While the last property is not directly relevant for us (but interesting nonetheless), the first two suggest that it makes this a good scale to use for ticks.

(Also, I have to say I'm more than a little frustrated with how easily they decide what is "more" or "less" nice (4.1 on page three) after they state that there is no objective research into what makes a number nice, and then refer to numbers in a table without including plots with various settings for subjective comparison as an appendix - that kind of assumption that an "objective" formula is more "true" than doing proper user testing drives me up the wall)

JobLeonard commented 7 years ago

My crappy sketches that helped me think of all the things that need to be included

img_20170524_122438_dro-01

img_20170524_145004-01

Things that need to be calculated

JobLeonard commented 7 years ago

I realised today that I'm doing this all backwards: the axes are the hardest part, so do the labels and heatmap scale first, which are much easier and will still be useful without the axes:

image

Uploaded to the server too.

JobLeonard commented 7 years ago

New addition: labels on the clusters (only sensible in categorical mode).

Would require

Not too complicated actually.

JobLeonard commented 6 years ago

Well, I got labels to work, but I still have to make sure they don't trigger for heatmap data, or when there are more than (say) 1000 unique values:

image