vectordotdev / vrl

Vector Remap Language
Mozilla Public License 2.0
119 stars 57 forks source link

parse_timestamp Invalid timestamp error #790

Open DimDroll opened 4 months ago

DimDroll commented 4 months ago

A note for the community

Problem

Hello,

We have a case when we need to parse nanoseconds timestamp like 1712312062676735437 using parse_timestamp function: .2_timestamp = parse_timestamp!("1712312062676735437", "%s%f") Which gives error:

error[E000]: function call error for "parse_timestamp" at (152:199): Invalid timestamp "1712312062676735437": premature end of input
  β”Œβ”€ :5:16
  β”‚
5 β”‚ .2_timestamp = parse_timestamp!("1712312062676735437", "%s%f")
  β”‚                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Invalid timestamp "1712312062676735437": premature end of input
  β”‚
  = see language documentation at https://vrl.dev
  = try your code in the VRL REPL, learn more at https://vrl.dev/examples

The workaround is to add space in between timestamp and nanoseconds part: .4_timestamp = parse_timestamp!("1712312062 676735437", "%s %f")

Following is VRL code that can be executed in playground:

Vector Version: 5ffb1a55
VRL Version:    0.13.0

I assume same happens during real-time run.

# does not parse nanoseconds timestamp
# error: premature end of input
#.1_timestamp = parse_timestamp!("1712312062676735437", "%s%9f")
#.2_timestamp = parse_timestamp!("1712312062676735437", "%s%f")

# does parse
# by adding space in-between
# unix timestamp and nanoseconds
.3_timestamp = parse_timestamp!("1712312062 676735437", "%s %9f")
.4_timestamp = parse_timestamp!("1712312062 676735437", "%s %f")

# workaround
full_timestamp = "1712312062676735437"
timestamp_slice = slice!(full_timestamp, 0, 10)
nanoseconds_slice = slice!(full_timestamp, 10)
.5_timestamp = parse_timestamp!(timestamp_slice + " " + nanoseconds_slice, "%s %f")

# another obeservation
# following results in strange concatenation error
# error: trailing input
#.6_timestamp = parse_timestamp!("1712312062" + " ", "%s ")
# but following works fine:
#.7_timestamp = parse_timestamp!("1712312062" + " A", "%s A")

# dirty workaround
#full_timestamp = "1712312062676735437"
#timestamp_slice = slice!(full_timestamp, 0, 10)
#nanoseconds_slice = slice!(full_timestamp, 10)
# parse timestamp and add nanoseconds post-factum
#.8_timestamp = parse_timestamp!(timestamp_slice, "%s")
#.8_timestamp = replace(to_string(.9_timestamp), "Z", "." + nanoseconds_slice + "Z")

Configuration

No response

Version

VRL Version: 0.13.0

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

jszwedko commented 4 months ago

Thanks @DimDroll . We are using a library to do the parsing here so any fix would likely need to be upstream in https://github.com/chronotope/chrono.

DimDroll commented 4 months ago

Thank you @jszwedko ,

I dug a little and found similar case reported earlier: https://github.com/vectordotdev/vrl/issues/117 which you had checked as well.

It seems like you raised this case with chrono in the past and it had been worked on recently, I summarized related links and cautiously asked in the following thread if there is a progress planned: https://github.com/chronotope/chrono/issues/1399

In the meantime will stick with aforementioned workaround.

DimDroll commented 4 months ago

Just as a note, workaround will be:

.@timestamp = from_unix_timestamp!(9007199254740993, unit: "nanoseconds")

However, I will keep this open as parse_timestamp is supposed to support it as well. I would assume once it is fixed from_unix_timestamp and to_unix_timestamp could be deprecated as parse_timestamp and format_timestamp should suffice. It will help parse_timestamp could have "unit" precision modifier added for the full replacability.

@jszwedko please share if it make sense for the long-term?

jszwedko commented 4 months ago

Just as a note, workaround will be:

.@timestamp = from_unix_timestamp!(9007199254740993, unit: "nanoseconds")

However, I will keep this open as parse_timestamp is supposed to support it as well. I would assume once it is fixed from_unix_timestamp and to_unix_timestamp could be deprecated as parse_timestamp and format_timestamp should suffice. It will help parse_timestamp could have "unit" precision modifier added for the full replacability.

@jszwedko please share if it make sense for the long-term?

That works too. from_unix_timestamp and to_unix_timestamp are more efficient, given they deal with integers and not parsing / encoding strings, and so I think they are still useful on their own even if parse_timestamp and format_timestamp could handle them correctly.