Info-Theoretic Framework requires positive values in range [0, 255]

sramirez / spark-infotheoretic-feature-selection

This package contains a generic implementation of greedy Information Theoretic Feature Selection (FS) methods. The implementation is based on the common theoretic framework presented by Gavin Brown. Implementations of mRMR, InfoGain, JMI and other commonly used FS filters are provided.

Apache License 2.0

134 stars 46 forks source link

Hi Michael,

You can discretize your data with my package spark-MDLP. I have updated the README file in order to reflect all the information that you demand.

_LabeledPoint data must be discretized as integer values in double representation, ranging from 0 to 255. By doing so, double values can be transformed to byte directly thus making the overall selection process much more efficient (communication overhead is deeply reduced).

Please refer to the MDLP package if you need to discretize your dataset:

https://spark-packages.org/package/sramirez/spark-MDLP-discretization_

sramirez / spark-infotheoretic-feature-selection

Info-Theoretic Framework requires positive values in range [0, 255] #10