telefonicaid / fiware-livedemoapp

fiware-livedemoapp
GNU Affero General Public License v3.0
6 stars 4 forks source link

Hive does not like datetimes in ISO format #16

Closed frbattid closed 10 years ago

frbattid commented 10 years ago

The datetime string persisted in HDFS contains the 'T' character as field sepatator between the date and time parts:

>>> from datetime import datetime
>>> date = datetime.now()
>>> print date.isoformat()
2014-02-20T10:31:38.297875

This is not what Hive expects, but a ' ' field separator (Hive manages datetimes as strings in the form "%Y-%m-%s %h:%i:%s.%f").

fgalan commented 10 years ago

My only concern with this is whether we would be "tieing" too much to Hive behaviour/limitation. I mean, Hive is just one of the possible stacks that can run on top Hadoop/HDFS (although an important one for us in FI-WARE, of course). In this situation of multiple potential users of the date field, adhere to standards (in this case, ISO 8601 standard) is probably the best option.

Maybe we can think in a CLI argument (another one :) to activate/deactivate Hive compatibility.

Opinions?

frbattid commented 10 years ago

Hive has the folloguing User Defined Functions (UDF): https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions

There exists other custom UDF that can be added as .jar files to Hive. This one converts ISO 8601 datetimes into Hive ones: https://github.com/simplymeasured/hive-udf

You are probably right and we should maintain the standard format, and use formatter pluggins.

fgalan commented 10 years ago

At the end we decided to maintain the standard format in Cygnus (https://github.com/telefonicaid/fiware-connectors/tree/develop/cosmos/cygnus), so the issue is closed.