titicaca / spark-iforest

Isolation Forest on Spark
Apache License 2.0
227 stars 89 forks source link

java.lang.ArrayIndexOutOfBoundsException:0 #21

Closed qiyuehan closed 5 years ago

qiyuehan commented 5 years ago

我在IDE中,使用这个算法处理流数据,当数据发送端没有发送数据时,只开启sparkstreaming时候,loadedPipelineModel.transform(df)这个代码会报错:java.lang.ArrayIndexOutOfBoundsException:0。定位到iForest.data代码中的104行: 103 threshold = scoreDataset.stat.approxQuantile($(anomalyScoreCol), 104 Array(1 - $(contamination)), $(approxQuantileRelativeError))(0)

但是在shell端使用上述代码就不会报错。

titicaca commented 5 years ago

传入的df是否为空?

blackvvine commented 4 years ago

Had a similar error when using the library in streaming, it was indeed due to an empty DataFrame. So just do a safeguard check before it:

if not df.rdd.isEmpty():
     df = iforest.transform(df)