plotly / Plotly.NET

interactive graphing library for .NET programming languages :chart_with_upwards_trend:
https://plotly.net
MIT License
663 stars 88 forks source link

Histogram giving incorrect representation of data #391

Closed heron1 closed 1 year ago

heron1 commented 1 year ago

Description

Chart.Histogram seems to not accurately reflect the underlying data, unless I'm misunderstanding how things work.

Repro steps

Consider the code:
let rnd = System.Random()
let x =
    [ for i = 0 to 1000000 do
          yield rnd.NextDouble() ] //double range from 0 to 1
let histo1 = Chart.Histogram(X = x, NBinsX = 5) 

histo1 |> Chart.show

This shows a histogram chart with a non-even distribution of what should be evenly distributed numbers (5 bins, each 0.2 in size, from 0 to 1). In fact, bin 0-0.2, and bin 0.8-1 show only 100k elements, whilst bins 0.2-0.4, 0.4-0.6, 0.6-0.8 show 200k elements - a 50% error.

Expected behavior

There should be an even distribution. This can be confirmed by running the following code after the above code:

let counts = List.groupBy (fun t ->
                match t with
                | x when x < 0.2 -> 1
                | x when x < 0.4 -> 2
                | x when x < 0.6 -> 3
                | x when x < 0.8 -> 4
                | _ -> 5) x
                |> List.map (fun (group, x) -> x.Length)
printfn $"counts: {counts}"

(the counts are evenly distributed)

Related information

omaus commented 1 year ago

This problem is probably due to Plotly.js since it uses default bins here (because you didn't specify them, apart from the number):

image This is a part of the JavaScript generated in the Browser window – as you can see, only nbinsx is specified.

I don't know why Plotly.js uses -0.1 as lower and 1.1 as upper limit when values range from 0 to 1 but this is the source of the problem.

So, try this:

let xBins = TraceObjects.Bins.init(Start = 0., End = 1., Size = 0.2)

let histo2 = Chart.Histogram(X = x, NBinsX = 5, XBins = xBins) 
histo2 |> Chart.show

This solves the problem:

image

And it can also be seen when inspecting the page source's JS code:

image

heron1 commented 1 year ago

Thanks! That seems to do the trick. It also seems the NBinsX parameter is no longer necessary.

kMutagene commented 1 year ago

@heron1 can we close this or do you still have issues with histogram bin sizes?

heron1 commented 1 year ago

hi yes it can be closed thanks