twosigma / flint

A Time Series Library for Apache Spark
Apache License 2.0
993 stars 184 forks source link

How to get the last row of a TimeSeriesRDD? #46

Closed soloman817 closed 5 years ago

soloman817 commented 5 years ago

I have a time series RDD object, and I know internally it is sorted by the timestamps. What is the efficient way to get the start time and last time? There is a TimeSeriesRDD.first which returns the first row, so I can get the start time. But how to get the last row efficiently?

icexelloss commented 5 years ago

There is no efficient way to get the end time. It doesn't store end time internally (because it doesn't scan through to the end of the data when constructing a time series RDD).

soloman817 commented 5 years ago

Thanks for the reply. So currently I have to use ts.toDF.orderBy(new Column(timeColumnName).desc).take(1).head.getAs[Long](timeColumnName) to get it. It is not efficient, but with cache, it might be better.