Closed frbattid closed 10 years ago
My only concern with this is whether we would be "tieing" too much to Hive behaviour/limitation. I mean, Hive is just one of the possible stacks that can run on top Hadoop/HDFS (although an important one for us in FI-WARE, of course). In this situation of multiple potential users of the date field, adhere to standards (in this case, ISO 8601 standard) is probably the best option.
Maybe we can think in a CLI argument (another one :) to activate/deactivate Hive compatibility.
Opinions?
Hive has the folloguing User Defined Functions (UDF): https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions
There exists other custom UDF that can be added as .jar files to Hive. This one converts ISO 8601 datetimes into Hive ones: https://github.com/simplymeasured/hive-udf
You are probably right and we should maintain the standard format, and use formatter pluggins.
At the end we decided to maintain the standard format in Cygnus (https://github.com/telefonicaid/fiware-connectors/tree/develop/cosmos/cygnus), so the issue is closed.
The datetime string persisted in HDFS contains the 'T' character as field sepatator between the date and time parts:
This is not what Hive expects, but a ' ' field separator (Hive manages datetimes as strings in the form "%Y-%m-%s %h:%i:%s.%f").