onlinf / google-refine

Automatically exported from code.google.com/p/google-refine
0 stars 0 forks source link

Heatmaps - Determining correlations between values (aka pivot, aka scatterfacet for non-numeric values) #65

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I'd like to be able to get a x% of records which contain valueA in columnX 
also contain valueB in columnY.
This would then allow me to spot high correlations between non-numeric 
data, and narrow down on outliers for any necessary data cleaning.

This can be done manually be faceting on columnX and making a note of the 
count of valueA.
Then filter by valueA and add an additional facet on columnY, making a note 
of the count of valueB.  Unfortunately, this only allows me to look at one 
value combination at a time.

I'd like a representation, similar to the scatterfacet, to display this for 
all combinations of values in columnX and columnY (plus 'empty').  Along 
the x-axis are the values of columnX and along the y-axis the values of 
columnY.  As the data is non-numeric, the graph is split into cells.  The 
value of the cell is the percentage of records with count(recordA + 
recordB)/ count(recordA).  This could be done as a heatmap varying the cell 
brightness between 0 and 255 in scale with the percentage value.  Clicking 
a cell would get me the corresponding rows.  (bonus points for also being 
able to click through to get the inverse - records of valueA which don't 
contain valueB)

An overview of all possible heatmaps could also be generated, similar to 
the overview of all possible scatterfacets.

Original issue reported on code.google.com by iainsproat on 28 May 2010 at 11:46

GoogleCodeExporter commented 9 years ago
David pointed out some examples of a javascript pivot:

[http://people.csail.mit.edu/dfhuynh/projects/www-conferences/www-history-pivot-
category-year.html]
[http://people.csail.mit.edu/dfhuynh/projects/www-conferences/www2007-papers-piv
ot.html]

Original comment by iainsproat on 12 Oct 2010 at 5:04