vega / vega-lite-api

A JavaScript API for Vega-Lite.
https://observablehq.com/@vega/vega-lite-api
BSD 3-Clause "New" or "Revised" License
211 stars 17 forks source link

Rendering with maxbins when data have data points less than the number of bins themselves #373

Open Mahesha999 opened 2 years ago

Mahesha999 commented 2 years ago

I have a following data:

cdf_data = [
  { d_percentages: 0, student_percentages: 35 },
  { d_percentages: 10, student_percentages: 42 },
  { d_percentages: 20, student_percentages: 55 },
  { d_percentages: 30, student_percentages: 75 },
  { d_percentages: 40, student_percentages: 85 },
  { d_percentages: 50, student_percentages: 91 },
  { d_percentages: 60, student_percentages: 96 },
  { d_percentages: 70, student_percentages: 98 },
  { d_percentages: 80, student_percentages: 98 },
  { d_percentages: 90, student_percentages: 100 },
  { d_percentages: 100, student_percentages: 100 }
]

I created following visualization:

cdf_in_js_with_minbins = {
  const plot = vl.markBar()
    .data(cdf_data)
    .encode(
      vl.y()
        .fieldQ('student_percentages'),
      vl.x()
        .fieldQ('d_percentages')//.bin(true)
        .scale({ "domain": [0, 100] })
        .bin({ minbins: 10 })
    ).width(500).height(250);

  return plot.render();
}

This outputs:

image

Initially, before minbins: 10 above, I had tried maxbins: 30, and it rendered following:

image

This confused me a lot, especially because two bars in the range 90-100. Also, nowhere in cdf_data, it says 0-5 range has 35% of students and 5-10 range has 0% of students. I felt that, being "max" limit, it will end up showing just 10 bins as in case of first figure. Instead, it created 20 bins. Am I missing some understanding here or its a bug?

Here is the observablehq notebook rendering both plots.

domoritz commented 2 years ago

I think this is as expected. If you have prebinned data, use the binned property. The last in is inclusive the upper bound and not exclusive. The actual number of bins depends only on the range and not the data.