sryza / spark-timeseries

A library for time series analysis on Apache Spark
Apache License 2.0
1.19k stars 424 forks source link

Serialization of TimeSeries(RDD) to/from Parquet files #185

Closed souellette-faimdata closed 7 years ago

souellette-faimdata commented 7 years ago

I wrote some functions to save TimeSeries(RDD) objects to parquet file (with the datetimeindex as a separate file) and then to load them from these files. I'm hoping you will find this useful. Let me know if you want me to submit a pull request for this.

sryza commented 7 years ago

This would be extremely useful. Are you imagining that there would be a Parquet record for each time series or for each instant?

souellette-faimdata commented 7 years ago

My current implementation saved each time series as a record. I save the DateTimeIndex's string representation to a separate text file of the same provided name (with a .idx extension that I append to it).

sryza commented 7 years ago

Cool. That seems preferable to me.