Open JobLeonard opened 7 years ago
Marks for axes need to automatically scale for numerical data, and we need sensible defaults.
I think this works as a starting point:
{ 1, 2, 5 } * 10^n
, where n is chosen such that total ticks is
If multiple answers fit the above criterium, choose the one with he most ticks
Example:
min value: 0, max value: 8, delta: 8. correct answers include:
In this case, marks should be spread by one
We might want to make "never less than 3, never more than 10" scale with available pixels (so for a big plot, allow for more ticks).
We should think of significant digits: if we only have the values 1, 2, 3 and 4 in our dataset, there's no point in showing ticks every 0.5 points.
We allow for log2 scaling. This might require some special logic, but perhaps a simple projection suffices.
http://vis.stanford.edu/files/2010-TickLabels-InfoVis.pdf
-- Sten Linnarsson, PhD Professor of Molecular Systems Biology Karolinska Institutet Unit of Molecular Neurobiology Department of Medical Biochemistry and Biophysics Scheeles väg 1, 171 77 Stockholm, Sweden +46 8 52 48 75 77 (office) +46 70 399 32 06 (mobile)
On 24 May 2017, at 14:24, Job van der Zwan notifications@github.com wrote:
Marks for axes need to automatically scale for numerical data, and we need sensible defaults.
Base "formula"
I think this works as a starting point:
• spread ticks in multiples of of one of { 1, 2, 5 } * 10^n, where n is chosen such that total ticks is • never less than 3 (exception: less than three unique values), • never more than 10 If multiple answers fit the above criterium, choose the one with he most ticks
Example:
min value: 0, max value: 8, delta: 8. correct answers include:
• (1 10^0) = 1 (8 ticks) • 2 10^0 = 2 (4 ticks) In this case, marks should be spread by one
Refinements
We might want to make "never less than 3, never more than 10" scale with available pixels (so for a big plot, allow for more ticks).
We should think of significant digits: if we only have the values 1, 2, 3 and 4 in our dataset, there's no point in showing ticks every 0.5 points.
We allow for log2 scaling. This might require some special logic, but perhaps a simple projection suffices.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
Aaah, thanks! I hadn't thought of looking for papers on the subject. Looks interesting, although like any good maths paper they introduce formulas without defining what the terms are. Oh well, nothing a little trial and error won't fix.
Moving orders of magnitude to the labels is a good idea, will be stealing that.
The article also mentions that they do not know of any research that has been done into what type of numbers are "nice". However, I know of at least one example.
See, the reason I went with 1, 2 and 5 was because I recall reading an article in the early 2000s about the research behind the creation of the Euro coin. Annoyingly, I can't find it now. Anyway, IIRC the choice to scale bank notes in this order had mathematical grounding:
While the last property is not directly relevant for us (but interesting nonetheless), the first two suggest that it makes this a good scale to use for ticks.
(Also, I have to say I'm more than a little frustrated with how easily they decide what is "more" or "less" nice (4.1 on page three) after they state that there is no objective research into what makes a number nice, and then refer to numbers in a table without including plots with various settings for subjective comparison as an appendix - that kind of assumption that an "objective" formula is more "true" than doing proper user testing drives me up the wall)
My crappy sketches that helped me think of all the things that need to be included
Things that need to be calculated
I realised today that I'm doing this all backwards: the axes are the hardest part, so do the labels and heatmap scale first, which are much easier and will still be useful without the axes:
Uploaded to the server too.
New addition: labels on the clusters (only sensible in categorical mode).
Would require
Not too complicated actually.
Well, I got labels to work, but I still have to make sure they don't trigger for heatmap data, or when there are more than (say) 1000 unique values:
We have a lot of different cases to consider: