Closed barrybecker4 closed 7 years ago
Never mind. I think the problem was that I was not first binning the date column before calling MDLP.
Actually the problem doesn't have anything to do with dates. It has to do with there being no continuous columns to bin. If you call MDLP with a vector input column that is empty, then you get this error. Reopening to fix this edge case.
I fixed on my fork by adding an assertion statement, and giving a message there there must be columns to discretize.
Added to my repo. I've copied the commit directly.
I noticed that we see the following exception when trying to run MDLP with a date label specified. I will add a unit test and try to debug.
java.util.NoSuchElementException: nullscala.collection.LinearSeqOptimized$class.last(LinearSeqOptimized.scala:148) scala.collection.immutable.List.last(List.scala:84) org.apache.spark.mllib.feature.InitialThresholdsFinder.findInitialThresholds(InitialThresholdsFinder.scala:58) org.apache.spark.mllib.feature.MDLPDiscretizer.initialThresholds(MDLPDiscretizer.scala:58) org.apache.spark.mllib.feature.MDLPDiscretizer.runAll(MDLPDiscretizer.scala:120) org.apache.spark.mllib.feature.MDLPDiscretizer$.train(MDLPDiscretizer.scala:312) org.apache.spark.ml.feature.MDLPDiscretizer.fit(MDLPDiscretizer.scala:131) com.mineset.spark.ml.evidence.EvidenceInducer.createDiscretizerModel(EvidenceInducer.scala:256) com.mineset.spark.ml.evidence.EvidenceInducer.createBucketizers(EvidenceInducer.scala:227) com.mineset.spark.ml.evidence.EvidenceInducer.createPipeline(EvidenceInducer.scala:138) com.mineset.spark.ml.evidence.EvidenceInducer.execute(EvidenceInducer.scala:92) com.mineset.spark.ml.evidence.EvidenceInducer.execute(EvidenceInducer.scala:75)