libAtoms / abcd

2 stars 4 forks source link

logarithmic histogram bins #84

Open stenczelt opened 4 years ago

stenczelt commented 4 years ago

add option to look set up logarithmic bins in histogram with summary.

for example: if I have a dataset with GAP errors, I am interested in their order of magnitude mainly, would plot them logarithmically and not linerarly in a histogram. The following is shown at the moment which is not very meaningful.


abs_error  count: 426 min:  9.3100e-05 med:  5.7787e-02 max:  9.4018e-01 std:  7.5239e-02 var: 5.6609e-03
▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 351 [ 9.3100e-05,      0.0941)
▉▉▉▉▉▉▉                                   63 [ 9.4102e-02,      0.1881)
                                           7 [ 1.8811e-01,      0.2821)
                                           2 [ 2.8212e-01,      0.3761)
                                           0 [ 3.7613e-01,      0.4701)
                                           1 [ 4.7014e-01,      0.5641)
                                           1 [ 5.6415e-01,      0.6582)
                                           0 [ 6.5815e-01,      0.7522)
                                           0 [ 7.5216e-01,      0.8462)
                                           1 [ 8.4617e-01,      0.9402)```

I understand that we can add a new property with the current functionality, but I cannot do that in the visualiser for example (could code it there as well though) and is not that straightforward for a first time user
gabor1 commented 4 years ago

A follow-on idea: it should “detect” when the histogram counts are so diverse in magnitude that they should be shown logarithmically too. I think we want to maximise the entropy (i.e. information content) of the GRAPH, so if p_i is the height of the graph then the entropy of the graph is sum_i p_i log (p_i). This can be calculated for p_i = histogram count, p_i = log(histogram count), I think we can even use this heuristic for deciding whether the bins should be linear or logarithmic (the original topic of this issue).

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 24 Feb 2020, at 11:17, Tamas Stenczel notifications@github.com wrote:

add option to look set up logarithmic bins in histogram with summary.

for example: if I have a dataset with GAP errors, I am interested in their order of magnitude mainly, would plot them logarithmically and not linerarly in a histogram. The following is shown at the moment which is not very meaningful.

abs_error count: 426 min: 9.3100e-05 med: 5.7787e-02 max: 9.4018e-01 std: 7.5239e-02 var: 5.6609e-03 ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 351 [ 9.3100e-05, 0.0941) ▉▉▉▉▉▉▉ 63 [ 9.4102e-02, 0.1881) 7 [ 1.8811e-01, 0.2821) 2 [ 2.8212e-01, 0.3761) 0 [ 3.7613e-01, 0.4701) 1 [ 4.7014e-01, 0.5641) 1 [ 5.6415e-01, 0.6582) 0 [ 6.5815e-01, 0.7522) 0 [ 7.5216e-01, 0.8462) 1 [ 8.4617e-01, 0.9402)```

I understand that we can add a new property with the current functionality, but I cannot do that in the visualiser for example (could code it there as well though) and is not that straightforward for a first time user

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

stenczelt commented 4 years ago

that is a good idea, can surely do it!

I think the sensible design is to have options first for --liner and --log with the logic behind and then add --auto which does this automtically, perhaps making the latter the default

gabor1 commented 4 years ago

yes, that was exactly how I thought about it too

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 24 Feb 2020, at 14:36, Tamas Stenczel notifications@github.com wrote:

that is a good idea, can surely do it!

I think the sensible design is to have options first for --liner and --log with the logic behind and then add --auto which does this automtically, perhaps making the latter the default

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

gabor1 commented 4 years ago

but you want to separate —linear-bin and —linear-count

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 24 Feb 2020, at 14:44, Gabor Csanyi gc121@cam.ac.uk wrote:

yes, that was exactly how I thought about it too

-- Gábor

Gábor Csányi Professor of Molecular Modelling Engineering Laboratory, University of Cambridge Pembroke College Cambridge

Pembroke College supports CARA. A Lifeline to Academics at Risk. http://www.cara.ngo/

On 24 Feb 2020, at 14:36, Tamas Stenczel notifications@github.com wrote:

that is a good idea, can surely do it!

I think the sensible design is to have options first for --liner and --log with the logic behind and then add --auto which does this automtically, perhaps making the latter the default

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.