sryza / spark-timeseries

A library for time series analysis on Apache Spark
Apache License 2.0
1.19k stars 424 forks source link

UnivariateTimeSeries.fillNearest doesn't fill NaNs #197

Open Tautvis opened 7 years ago

Tautvis commented 7 years ago

UnivariateTimeSeries.fillNearest doesn't fit NaN values at the start on the Vector.

For example:

import com.cloudera.sparkts.UnivariateTimeSeries
import org.apache.spark.mllib.linalg.Vectors

val ts = Vectors.dense(Array(Double.NaN, 2.0, Double.NaN))
println(UnivariateTimeSeries.fillNearest(ts))

prints

scala>   val ts = Vectors.dense(Array(Double.NaN, 2.0, Double.NaN))
ts: org.apache.spark.mllib.linalg.Vector = [NaN,2.0,NaN]

scala>   UnivariateTimeSeries.fillNearest(ts)
res1: org.apache.spark.mllib.linalg.DenseVector = [NaN,2.0,2.0]