twosigma / flint

A Time Series Library for Apache Spark
Apache License 2.0
995 stars 184 forks source link

Test failure in ClockSpec tests #30

Closed kenahoo closed 6 years ago

kenahoo commented 6 years ago

When running sbt assembly, I get an error that looks like it's picking up an ambient timezone somewhere in my environment:

[info] UniformClock
[info] - should generate clock ticks correctly (4 milliseconds)
[info] - should generate clock ticks in RDD correctly (90 milliseconds)
[info] - should generate clock ticks in TimeSeriesRDD correctly (164 milliseconds)
[info] - should generate clock ticks with offset in TimeSeriesRDD correctly (74 milliseconds)
[info] - should generate clock ticks with offset & time zone in TimeSeriesRDD correctly (75 milliseconds)
[info] - should generate clock ticks with default in TimeSeriesRDD correctly (172 milliseconds)
[info] - should generate timestamp correctly *** FAILED *** (160 milliseconds)
[info]   1989-12-31 18:00:00.0 did not equal 1990-01-01 00:00:00.0 (ClockSpec.scala:85)

Is this a known issue, or a known problem in my setup maybe?

icexelloss commented 6 years ago

This is a local timezone issue. I think this commit: https://github.com/twosigma/flint/commit/6ece04232ec1af67fe21307c80ad258af99b9c41 should have fixed it.

Please let me know otherwise.

kenahoo commented 6 years ago

Thanks, the test does indeed pass now.

However, it seems like Clocks.uniform(..., timeZone="UTC") should always create a UTC object, regardless of the ambient setting of spark.sql.session.timeZone. For instance, observe the following:

val clock1 = Clocks.uniform(sc, "1day", beginDateTime = "19900101", timeZone = "UTC")
println("Created: " + clock1.toDF.take(1)(0))
println("have timeZone: " + conf.contains("spark.sql.session.timeZone"))

# output:
Created: [1989-12-31 18:00:00.0]
have timeZone: false

Or am I understanding the interface incorrectly?

icexelloss commented 6 years ago

timeZone = "UTC" means "19900101" is in UTC. The actual timestamp in the dataframe follows just the normal Spark timestamp semantics.

kenahoo commented 6 years ago

I see - so only the timezone the input should be interpreted in. Thanks.