Error when label is date

barrybecker4 commented 7 years ago

I noticed that we see the following exception when trying to run MDLP with a date label specified. I will add a unit test and try to debug.

java.util.NoSuchElementException: nullscala.collection.LinearSeqOptimized$class.last(LinearSeqOptimized.scala:148) scala.collection.immutable.List.last(List.scala:84) org.apache.spark.mllib.feature.InitialThresholdsFinder.findInitialThresholds(InitialThresholdsFinder.scala:58) org.apache.spark.mllib.feature.MDLPDiscretizer.initialThresholds(MDLPDiscretizer.scala:58) org.apache.spark.mllib.feature.MDLPDiscretizer.runAll(MDLPDiscretizer.scala:120) org.apache.spark.mllib.feature.MDLPDiscretizer$.train(MDLPDiscretizer.scala:312) org.apache.spark.ml.feature.MDLPDiscretizer.fit(MDLPDiscretizer.scala:131) com.mineset.spark.ml.evidence.EvidenceInducer.createDiscretizerModel(EvidenceInducer.scala:256) com.mineset.spark.ml.evidence.EvidenceInducer.createBucketizers(EvidenceInducer.scala:227) com.mineset.spark.ml.evidence.EvidenceInducer.createPipeline(EvidenceInducer.scala:138) com.mineset.spark.ml.evidence.EvidenceInducer.execute(EvidenceInducer.scala:92) com.mineset.spark.ml.evidence.EvidenceInducer.execute(EvidenceInducer.scala:75)

barrybecker4 commented 7 years ago

Never mind. I think the problem was that I was not first binning the date column before calling MDLP.

barrybecker4 commented 7 years ago

Actually the problem doesn't have anything to do with dates. It has to do with there being no continuous columns to bin. If you call MDLP with a vector input column that is empty, then you get this error. Reopening to fix this edge case.

barrybecker4 commented 7 years ago

I fixed on my fork by adding an assertion statement, and giving a message there there must be columns to discretize.

sramirez commented 7 years ago

Added to my repo. I've copied the commit directly.

sramirez / spark-MDLP-discretization

Error when label is date #28