tokio-rs / prost

PROST! a Protocol Buffers implementation for the Rust Language
Apache License 2.0
3.86k stars 500 forks source link

fix(prost-types): Converting DateTime to Timestamp is fallible #1095

Closed caspermeijn closed 2 months ago

caspermeijn commented 3 months ago

The converstion from the private type DateTime to public type Timestamp is fallible when the DateTime is invalid. All code paths that used DateTime::into::<Timestamp>() first check whether the DateTime is valid and then do the conversion, therefore this problem was not visible. https://github.com/tokio-rs/prost/issues/893 points out that some conversions panic, however all these DateTime are invalid.

Solution: Replace impl From<DateTime> for Timestamp with TryFrom and remove the manual is_valid checks before the conversion.

I added the test cases from the issue and roundtrip test between DateTime and Timestamp.

caspermeijn commented 3 months ago

@mumbleskates bilrost solved this problem differently. Do you agree with this solution as well?

mumbleskates commented 3 months ago

i took a little time today to hook up a fuzzer to the timestamp <--> string machinery and i'm finding all kinds of problems.

so far: parse_two_digit_numeric can panic in split_at if the input is short. after fixing that, it appears that not only does the string "19+0-01-11" parse as the timestamp "1900-01-10T00:00:00Z", but that string parses as the timestamp for "1900-01-09T00:00:00Z" (with the formatting output matching the timestmap value, but the parsing side regressing by 1 day). This error occurs somewhere inside date_time_to_seconds :(

mumbleskates commented 3 months ago

year_to_seconds incorrectly reports that the year 1900 specificially is a leap year :pensive:

caspermeijn commented 2 months ago

i took a little time today to hook up a fuzzer to the timestamp <--> string machinery and i'm finding all kinds of problems.

so far: parse_two_digit_numeric can panic in split_at if the input is short. after fixing that, it appears that not only does the string "19+0-01-11" parse as the timestamp "1900-01-10T00:00:00Z", but that string parses as the timestamp for "1900-01-09T00:00:00Z" (with the formatting output matching the timestmap value, but the parsing side regressing by 1 day). This error occurs somewhere inside date_time_to_seconds :(

I have been playing with the idea of improving compatibility with chrono and deprecating the string to/from timestamp conversion. I feel that this type of string operation is a bad fit for a library like prost. We should focus on our strengths and use the strengths of others. What do you think of replacing this conversion by chrono crate?

mumbleskates commented 2 months ago

possibly. one of the original reasons for having it, though, was that prost_types supports the full range of i64 seconds. chrono only supports the range of i64 milliseconds, for some reason; others may be even less. chrono's string parsing is even more constrained, only supporting positive 4-digit years.

indeed, perhaps it is more sensible to support from-string conversion only via pivoting through chrono, though this may effectively make Timestamp's to-string conversion fallible. i understand the rationale behind dropping this functionality, as it is messy and was never really validated before, but it feels really bad losing the infallibility to me.

i can contribute that, with the 3 fixes i include or mention in https://github.com/tokio-rs/prost/pull/1096, fuzz testing in bilrost with cross-validation against chrono soaked up everything i have been able to throw at it (many cpu-days) with no problems. it's also (now, after fixing the year 1900 bug) exhaustively correct for each individual minute from 0000-01-01 through 9999-12-31 (the full range of chrono's rfc3339 support), which gives me a reasonable amount of confidence that it is fully correct.

caspermeijn commented 2 months ago

possibly. one of the original reasons for having it, though, was that prost_types supports the full range of i64 seconds. chrono only supports the range of i64 milliseconds, for some reason; others may be even less. chrono's string parsing is even more constrained, only supporting positive 4-digit years.

indeed, perhaps it is more sensible to support from-string conversion only via pivoting through chrono, though this may effectively make Timestamp's to-string conversion fallible. i understand the rationale behind dropping this functionality, as it is messy and was never really validated before, but it feels really bad losing the infallibility to me.

i can contribute that, with the 3 fixes i include or mention in #1096, fuzz testing in bilrost with cross-validation against chrono soaked up everything i have been able to throw at it (many cpu-days) with no problems. it's also (now, after fixing the year 1900 bug) exhaustively correct for each individual minute from 0000-01-01 through 9999-12-31 (the full range of chrono's rfc3339 support), which gives me a reasonable amount of confidence that it is fully correct.

Ok, let's work towards a fuzz tested implementation. Could you review and approve this PR? Then I will look at yours.

What I find interesting is that the protobuf comments on the Timestamp type specifically mention year 0001 through 9999. So you could argue that the chrono implementation is good enough. But I agree we should not settle for good enough and make it infallible.

mumbleskates commented 2 months ago

fuzz testing for some pretty reasonable bounds is available in bilrost as of here. checked for round tripping, checked that accepted inputs match a reasonable regex, and checked that any date in the supported range of chrono matches its interpretation exactly. i've run it for >3 billion iterations and it's stopped making progress so this should at least be a good starting point :+1: