syntagmatic / parallel-coordinates

A d3-based parallel coordinates plot in canvas. This library is no longer actively developed.
http://syntagmatic.github.com/parallel-coordinates/
Other
510 stars 211 forks source link

Automatic axis type detection enhancement #14

Closed LeJav closed 11 years ago

LeJav commented 11 years ago

Many thanks for your excellent work. I have tried d3.js for parallel coordinates with the help of jasondavies examples. I have improved it with automatic data type detection: numeric, string or date. I have something like that in my code:

  var col_string = {  } ; // you can force string type for some data
  var col_date = { } ; // you can force date type for some data
  var format_date = "%d/%m/%Y" ; // Cf d3.time.format 
  // and for automatic detection
  var exp_date = /^(\d){1,2}\/(\d){1,2}\/(\d){4}$/

...

  d3.csv("...csv", function(commande) {
    // Extract the list of dimensions
    // For each dimension, guess if numeric value or not and create vert scales
    all_dims = d3.keys(commande[0]) ;
    // Filter hidden dimensions
    dimensions = d3.keys(commande[0]).filter(function(key) {
      // guess if string values
    if (!col_string[key]) col_string[key] = commande.map(function(p){return p[key];}).some(function(p){return isNaN(p);})
       // guess if date values 
      if (!col_date[key])       {
        if ((col_string[key]) && (commande.map(function(p){return p[key];}).every(function(p){return (p.match (exp_date));})))
  { 
  col_date[key] = true ; 
  col_string[key] = false ;
  // convert in date format
  commande.forEach (function(p) {p[key] = d3_format_date.parse (p[key]);}) ;  
  }
} 

And after, I create the appropriate axis

      // if string data: ordinal scale
      if (col_string[key]) {
  return (vert_scales[key] = d3.scale.ordinal() .domain(commande.map(function(p) { return p[key]; }).sort()).rangePoints([h, 0],1));} 
     // else, scale linear 
else {
var interval ;
if (intervals[key])  interval = intervals[key] ;
else
{ 
 interval = d3.extent(commande, function(p) { return +p[key]; }); 
   // add padding (5%)
   var padding = (interval[1]- interval[0])*0.05 ; 
  interval[0] =  interval[0] - padding ;
   interval[1] =  interval[1] + padding ;
if (col_date[key])  return (vert_scales[key] = d3.time.scale().domain(interval) .range([h, 0]));
      else return (vert_scales[key] = d3.scale.linear().domain(interval).range([h,0]));
} });

I also specify the date format for the date axis (could be configurable):

    var axis = g.append("svg:g") 
     .attr("class", function(d) { return col_string[d] ? "axis label" : "axis";})
    .each(function(d)
      {                                                                                                                                                        
         if (col_date[d]) d3.select(this).call(axis.scale(vert_scales[d]).tickFormat(d3.time.format("%m/%y")));
          else d3.select(this).call(axis.scale(vert_scales[d]));}) ;

...

This allows me to add special features for ordinal axis: when I mouse over a label, I highlight all lines which have this value.

Finally, for brush extents:

    var extents = actives.map(function(p) { return vert_scales[p].brush.extent(); });
foreground.classed("active", function(d) {
return actives.every(function(p, i) {
  var val ;
  // string type
  if (col_string[p])   val = vert_scales[p](d[p]) ;
 // numeric axis
else  val = d[p]  ; 
 return extents[i][0] <= val && val <= extents[i][1]; 
    }) ;
 });                                                                                                                                                                            

With that, brushes are well managed either for numeric, string and date axis

Do you think that such features could be integrated in d3.parcoords.js?

syntagmatic commented 11 years ago

This is nice work, and related to a recent work Chris Rich: http://bl.ocks.org/4173826

It's definitely great thinking. I had ordinal/string axes on my mind, but dates would be nice to support as well.

I'll try to integrate this in the next couple weeks. One thing I want to do is avoid too much if/else logic in the code. Each dimension should have a "type" and each type needs functions to format/compare to support axes labels and brushing. These functions will be exposed as values of an object, for example for the base scales:

var scale = {
  numeric: function(key, data) {
    return d3.scale.linear().domain(d3.extent(data, function(p) { return +p[key]; })).range([h,0]));
  },
  ordinal: function(key, data) {
    return d3.scale.ordinal() .domain(data.map(function(p) { return p[key]; }).sort()).rangePoints([h, 0],1);
  },
  time: function(key, data) {
    return d3.time.scale().domain(d3.extent(data, function(p) { return +p[key]; })) .range([h, 0]));
  },
};

And so for comparison/formatting and other logic. We'll need to document what exactly needs to be added to support a new type of scale.

The user could then define new scales or override existing scales, and we would avoid hardcoding particular scales deep in the library.

Something to look at is Miso Dataset, which has put a lot of thought into type detection and column data types:

http://misoproject.com/dataset/tutorials/data_types

Thanks LeJav, very useful stuff. I'll be referring to your code for the next iteration of the library.

LeJav commented 11 years ago

Sorry for the mistake (unwanted close!) I have implemented the features in my own release. I am not completely satisfied, there are still too much if-then-else, but it could be a good basis. You can test it at: http://www.nitisco.fr/top500/ Unpacked sources are at: http://www.nitisco.fr/top500/sources.tar.gz Note that window resize is well managed. Hoping that it will help!

syntagmatic commented 11 years ago

https://github.com/syntagmatic/parallel-coordinates/commit/7b6477beefef257ca4867e5fafdfdc5caab94531

Okay, I've decided to add an object "types" to the internal __ object. This keeps track of the detected type of each dimensions, just like yscale keeps track of the detected scale for each dimension. The type detection occurs in detectDimensions, so the types object will be populated whenever dimension detection is used.

This way, internally, you can look up the type of the dimension like so:

// get type of dimension
__.types[dim]

The types object can also be accessed with the getter/setter

// get type of dimension
pc.types()[dim]

For objects like this, it might be nice to provide some way to override a single value without resetting the entire object.

This seemed to me to be the simplest solution to adding the type information to each dimension. I actually tried reworking the dimensions array to be an array of objects instead of strings to add new information to each dimension. But that requires an additional lookup every time you need the dimension name, and it makes it harder to reorder dimensions by hand (passing in a list of strings).

I'll add in default scales for date/strings soon.

paulklemm commented 11 years ago

@LeJav Thank you very much for sharing your code! With a few tweaks here and there I was able to include support for ordinal data in my visualization with very reasonable effort on my end. Maybe you should think about making a fork and remove some of those static variables (like the fixed canvas height - took me a while to find that ;) ). Awesome :beers: ! @syntagmatic Official support for ordinal data through parallel-coordinates would be really great. Spending my day on reading d3 related forums regarding parallel coordinates with ordinal data I assume many people have good use for that.

syntagmatic commented 11 years ago

@Powernap This will definitely make it in. I have procrastinated sitting down to refactor the library and include it. There are several good solutions out there, and it's one of the most asked for features. I'll make some time next week to look at it again.

syntagmatic commented 11 years ago

Also a nice implementation of ordinal axes here:

http://dexvis.wordpress.com/2013/01/28/d3js-parallel-lines-and-football/

syntagmatic commented 11 years ago

I've added support for ordinal dimensions. These are now detected by string columns. If more than 60 unique values are detected though, that dimension is omitted (to avoid name/id dimensions).

I'll leave this open since I'd like to include these suggestions on dates as well.

foldager commented 11 years ago

Nice work. I needed to be able to define the order and the content of the ordinal scale myself. In my case I needed the same categories across different dimensions, in the same order. Even if a category was not observed in one dimension. I suggest that you make it possible to specify the domain of each ordinal scale.

I implemented it this way. A data structure containing the domains of the relevant dimensions:

var domain = {
    dim1: ["An", "Array", "Of", "Categories"],
    dim2: ["An", "Array", "Of", "Categories"],
    dim3: ["Cat", "Dog", "Human", "Tiger", "Unknown"]
}

I included domain in the config when calling d3.parcoords()

I changed the beginning d3.parcoords.js to accept the domain variable

var __ = {
    data: [],
    dimensions: [],
    types: {},
    domain: {},      // ADDED
    brushed: false,
    mode: "default",
    rate: 20,
    width: 600,
    height: 300,
    margin: { top: 24, right: 0, bottom: 12, left: 0 },
    color: "#069",
    composite: "source-over",
    alpha: 0.7
};

And then I changed the "string" part of defaultScales from:

var defaultScales = {
      "number": function(k) {
        return d3.scale.linear()
          .domain(d3.extent(__.data, function(d) { return +d[k]; }))
          .range([h()+1, 1])
      },
      "string": function(k) {
        return d3.scale.ordinal()
          .domain(__.data.map(function(p) { return p[k]; }))
          .rangePoints([h()+1, 1])
      }
    };

To:

var defaultScales = {
  "number": function(k) {
    return d3.scale.linear()
      .domain(d3.extent(__.data, function(d) { return +d[k]; }))
      .range([h()+1, 1])
  },
  "string": function(k) {
      if (typeof __.domain[k] != "undefined") {
          return d3.scale.ordinal()
            .domain(domain[k])
            .rangePoints([h()+1, 1])
      }
      else {
          return d3.scale.ordinal()
            .domain(__.data.map(function(p) { return p[k]; }))
            .rangePoints([h()+1, 1])
      }
  }
};
syntagmatic commented 11 years ago

It's possible to access the d3 scales directly. In most of the examples you can enter this in the command line to see them:

parcoords.yscale

The domain needs to be set before axes/brushes are created though. I'll be working more on axis/brush transitions in the future. d3 v3.3 has new animated brush transitions that I'd like to integrate.

The Date datatype has been added and will be detected if the field is populated by Date objects.

Thanks for all the help and suggestions.