hlin117 / mdlp-discretization

An implementation of the minimum description length principal expert binning algorithm by Usama Fayyad
BSD 3-Clause "New" or "Revised" License
101 stars 53 forks source link

Getting MDLP output as Empty array #6

Closed manishbansal-fk closed 7 years ago

manishbansal-fk commented 7 years ago

@hlin117 I am using MDLP transformer to get discretize values of a continuous variable. But I am getting MDLP output as Empty array. Below are data attributes as

E.g. mdlp = MDLP() mdlp.fit_transform(X.A.values.reshape(-1,1), yy) # Here X is pandas dataframe

Details :

X = count 1383730.000000 mean 5.899136 std 12.970693 min 1.000000 25% 1.000000 50% 3.000000 75% 6.000000 max 728.000000 Name: A, dtype: float64

yy = array([0, 0, 0, ..., 0, 0, 0])

xs.shape : (4708872, 1) yy.shape : (4708872,)

Is Empty output a valid output ? Please suggest.

Note : It is working for some of the features.

hlin117 commented 7 years ago

It might be a dataset issue similar to #2. Have you tried changing the min_depth parameter?

PS: I think the function should work if xs is of shape (4708872,).

manishbansal-fk commented 7 years ago

Thanks for pointing out min_depth parameter.

But how do we ensure that what is optimal value of min_depth for a feature.

hlin117 commented 7 years ago

Technically, hard wiring the min_depth parameter is already against the will of the original paper. The original paper argued for a heuristic for when to stop cutting.

But if you wanted to move forward with this, min_depth is another hyperparameter you can tune thru cross validation.

On Oct 13, 2016 01:12, "manishbansal-fk" notifications@github.com wrote:

Thanks for pointing out min_depth parameter.

But how do we ensure that what is optimal value of min_depth for a feature.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hlin117/mdlp-discretization/issues/6#issuecomment-253444737, or mute the thread https://github.com/notifications/unsubscribe-auth/ABx4nStScrPOh4qdeDGeth9CwSoHO_SUks5qzefRgaJpZM4KUroq .

manishbansal-fk commented 7 years ago

Thanks a lot.

hlin117 commented 7 years ago

Yes, good luck!

On Fri, Oct 14, 2016 at 2:12 AM, manishbansal-fk notifications@github.com wrote:

Thanks a lot.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hlin117/mdlp-discretization/issues/6#issuecomment-253749022, or mute the thread https://github.com/notifications/unsubscribe-auth/ABx4nRspcpnYeNbtI5a0EY-ctdtFejBUks5qz0eFgaJpZM4KUroq .