mithrandie / csvq

SQL-like query language for csv
https://mithrandie.github.io/csvq
MIT License
1.5k stars 65 forks source link

Add a datetime function to parse a date represented in a specified string format. #14

Closed derekmahar closed 4 years ago

derekmahar commented 4 years ago

Please add a datetime function that can parse a date represented in a specified string format. This function would be the inverse function to DATETIME_FORMAT and might accept a date string, a format string that describes the format of the date string, and return a datetime string or an integer that represents seconds or nanoseconds since January 1, 1970 UTC.

derekmahar commented 4 years ago

Nevermind. Cast function DATETIME can parse a string that resembles a datetime that doesn't necessarily strictly conform to the "standard" format.

derekmahar commented 4 years ago

Why can't function DATETIME parse string "2016-04-14 09:03"? I think it should be able to interpret such date and time strings that approximate a datetime value.

$ csvq "SELECT DATETIME('2016-04-14 09:03')"
+------------------------------+
| DATETIME('2016-04-14 09:03') |
+------------------------------+
|             NULL             |
+------------------------------+
derekmahar commented 4 years ago

Alternatively, I could transform the date and time string into the datetime format that function DATETIME expects. For example:

$ csvq "SELECT DATETIME('2016-04-14 09:03' || ':00') AS DateTime"
+----------------------------+
|          DateTime          |
+----------------------------+
| 2016-04-14T09:03:00-04:00  |
+----------------------------+
mithrandie commented 4 years ago

String formats that can be cast to datetime values by default are listed on the following page. https://mithrandie.github.io/csvq/reference/statement.html#parsing

Datetime formats differ with countries, and I want to make automatic type conversion as restrained as possible, so only the formats including seconds are provided by default. If you want to use other formats, these are three ways to interpret strings as datetime values.

  1. Specify with the command option.
    $ csvq --datetime-format "%Y-%m-%d %H:%i" "SELECT DATETIME('2016-04-14 09:03');"
  2. Set the Flag value by using ADD FLAG statement.
    $ csvq "ADD '%Y-%m-%d %H:%i' TO @@DATETIME_FORMAT; SELECT DATETIME('2016-04-14 09:03');"
  3. Write the format in your csvq configuration file. https://mithrandie.github.io/csvq/reference/command.html#configurations
    $ cat ~/.csvq_env.json
    {
    "datetime_format": ["%Y-%m-%d %H:%i"]
    }

Note: Fixed mistake in example at 2019-11-12 11:17:48 UTC.

derekmahar commented 4 years ago

Thank you for the explanation!

derekmahar commented 4 years ago
  1. Set the Flag value by using ADD FLAG statement.
    $ csvq "ADD '%Y-%m-%d %H:%i' TO @@DATETIME_FORMAT; SELECT DATETIME('2016-04-14 09:03');"

What @@DATETIME_FORMAT might I specify in order to assume time zone UTC for a date and time string that doesn't specify a time zone?

For example, instead of appending "Z" to the date and time string as in the following, how might I change @@DATETIME_FORMAT to assume time zone "Z" in the date and time string? In other words, how might I assume that all datetimes are in time zone UTC?

$ csvq "ADD '%Y-%m-%d %H:%i%Z' TO @@DATETIME_FORMAT; SELECT DATETIME('2016-04-14 09:03Z');"
+-------------------------------+
| DATETIME('2016-04-14 09:03Z') |
+-------------------------------+
| 2016-04-14T09:03:00Z          |
+-------------------------------+
mithrandie commented 4 years ago

The command option –timezone determines the default timezone in the execution.

$ csvq --timezone UTC "SELECT DATETIME('2016-04-14 09:03:00');"
+---------------------------------+
| DATETIME('2016-04-14 09:03:00') |
+---------------------------------+
| 2016-04-14T09:03:00Z            |
+---------------------------------+

The Flag @@TIMEZONE also determines the default timezone, but it may not work as expected with a single-line execution because strings representing datetime are converted when parsing syntax if possible.

$ csvq "SET @@TIMEZONE TO UTC; SELECT DATETIME('2016-04-14 09:03:00');"
+---------------------------------+
| DATETIME('2016-04-14 09:03:00') |
+---------------------------------+
| 2016-04-14T09:03:00-04:00       |
+---------------------------------+

# NOW() returns a datetime value as UTC because the time is calculated when the statement is executed, not during parsing.
$csvq "SET @@TIMEZONE TO UTC; SELECT NOW();"
+-----------------------------+
|            NOW()            |
+-----------------------------+
| 2019-11-16T04:00:23.299062Z |
+-----------------------------+

# In the interactive shell, each line is interpreted and executed sequentially, so you can use the default timezone.
$ csvq
/home/mithrandie > SET @@TIMEZONE TO UTC;
/home/mithrandie > SELECT DATETIME('2016-04-14 09:03:00');
+---------------------------------+
| DATETIME('2016-04-14 09:03:00') |
+---------------------------------+
| 2016-04-14T09:03:00Z            |
+---------------------------------+
derekmahar commented 4 years ago

Great answer! Thank you!