hlin117 / mdlp-discretization

An implementation of the minimum description length principal expert binning algorithm by Usama Fayyad
BSD 3-Clause "New" or "Revised" License
101 stars 54 forks source link

zero cut points #33

Open zackxconti opened 5 years ago

zackxconti commented 5 years ago

Hello,

Firstly, I would like to thank you for writing this library.

I am working with a dataset consisting of 6 input features and 1 target.

However, when discretizing the features with MDLP, 4 out of the 5 features are returned with zero cut points. Why does this occur?

I have used min_depth to avoid zero cut points but not sure how to generalise selecting a number.

The following is a sample from my dataset, where the last column is the target.

Many thanks! Zack

Span Height Amplitude Beam_tip_depth Beam_start_depth bc_X_position deflection
-0.730957031 -2.516601563 0.314941406 0.773730469 1.420410156 4.536132813 70.916017
-1.339355469 -2.516601563 0.314941406 0.773730469 1.420410156 4.536132813 79.156323
-0.730957031 1.497070313 0.314941406 0.773730469 1.420410156 4.536132813 81.301878
-0.730957031 -2.516601563 0.253417969 0.773730469 1.420410156 4.536132813 70.796428
-0.730957031 -2.516601563 0.314941406 0.348535156 1.420410156 4.536132813 71.149362
-0.730957031 -2.516601563 0.314941406 0.773730469 1.127441406 4.536132813 93.957363
-0.730957031 -2.516601563 0.314941406 0.773730469 1.420410156 1.274414063 62.477609
-0.730957031 1.497070313 0.253417969 0.348535156 1.127441406 1.274414063 102.220443
-1.339355469 -2.516601563 0.253417969 0.348535156 1.127441406 1.274414063 102.854849
-1.339355469 1.497070313 0.314941406 0.348535156 1.127441406 1.274414063 113.167323
-1.339355469 1.497070313 0.253417969 0.773730469 1.127441406 1.274414063 105.472502
-1.339355469 1.497070313 0.253417969 0.348535156 1.420410156 1.274414063 79.711979
-1.339355469 1.497070313 0.253417969 0.348535156 1.127441406 4.536132813 122.562968
-1.339355469 1.497070313 0.253417969 0.348535156 1.127441406 1.274414063 113.029294
1.019042969 -0.016601563 -1.435058594 0.423730469 2.170410156 2.036132813 33.78182
0.410644531 -0.016601563 -1.435058594 0.423730469 2.170410156 2.036132813 37.589132
1.019042969 -1.002929688 -1.435058594 0.423730469 2.170410156 2.036132813 32.273358
1.019042969 -0.016601563 -1.496582031 0.423730469 2.170410156 2.036132813 33.871977
1.019042969 -0.016601563 -1.435058594 0.698535156 2.170410156 2.036132813 34.774896
hlin117 commented 5 years ago

Thanks for the information, @zackxconti . Have you tried calculating the cut points manually? This might be a dataset issue, where the continuous values don't produce enough signal for the MDLP algorithm to produce a cut point.