houseabsolute / DateTime.pm

A date and time object for Perl
https://metacpan.org/release/DateTime/
Other
47 stars 49 forks source link

DateTime::Duration::in_units('days') should not return completely wrong answers for durations over 1 month #133

Open deven opened 2 years ago

deven commented 2 years ago

Currently, DateTime::Duration::in_units('days') will only return the days portion of the DateTime::Duration object, completely ignoring the number of months that may also be stored. Although this behavior is documented, the caveat is very easy to miss. This method should not return completely wrong answers like this.

Here is an example script which demonstrates the problem, and also shows that DateTime::delta_days() will return the correct number of days (albeit losing the negative sign), unlike DateTime::subtract_datetime():

#!/usr/bin/env perl

use strict;
use warnings;

use Data::Dump qw[dump];
use DateTime;

print "\$DateTime::VERSION = $DateTime::VERSION\n\n";

my $date1 = DateTime->new(year => 2022, month => 8, day => 9);
print "\$date1 = \"$date1\"\n\n";

my $date2 = DateTime->new(year => 2022, month => 6, day => 17);
print "\$date2 = \"$date2\"\n\n";

my $duration = $date2 - $date1;
print "\$duration = \$date2 - \$date1 = \$date2->subtract_datetime(\$date1) = " . dump($duration) . "\n\n";

print "\$duration->in_units(\"days\") = " . $duration->in_units('days') . "\n\n";

print "\$date2->delta_days(\$date1)->in_units(\"days\") = " . $date2->delta_days($date1)->in_units("days") . "\n";

Here is the output generated by the above script with DateTime version 1.58:

$DateTime::VERSION = 1.58

$date1 = "2022-08-09T00:00:00"

$date2 = "2022-06-17T00:00:00"

$duration = $date2 - $date1 = $date2->subtract_datetime($date1) = bless({
  days => -22,
  end_of_month => "preserve",
  minutes => 0,
  months => -1,
  nanoseconds => 0,
  seconds => 0,
}, "DateTime::Duration")

$duration->in_units("days") = -22

$date2->delta_days($date1)->in_units("days") = 53

Returning -22 for $duration->in_units("days") when the correct duration is -53 days is clearly wrong. Several better options come to mind:

deven commented 2 years ago

Note: Even if the decision is to die (or generate a warning) in this scenario, the method should still silently return the correct answer if the number of months stored in the DateTime::Duration object is zero, either because it was created by delta_days() or because the duration is less than a month. It certainly shouldn't die or warn if -22 is actually the correct answer!

n1vux commented 2 years ago

i sympathize with the objection,

But i'm unclear if any change is safe (even a Carp instead of a Croak), as this arguably buggy behavior may by now be relied upon ?

To really fix this API would probably be majorly backwards-incompatible , so should probably be a major version bump and big warnings in CHANGES and README.md etc ...

I'm fingerpainting that it could be phased

(This could still seriously break an app that migrated from $n to $n+3 after many years of non-maintenance! Here's hoping they have regression tests )

deven commented 2 years ago

But i'm unclear if any change is safe (even a Carp instead of a Croak), as this arguably buggy behavior may by now be relied upon ?

I can certainly imagine that existing code might be relying on this to return some numeric value, and croaking or returning undef might potentially cause breakage. However, the existing documentation explicitly says that "Smaller units are computed from what remains after taking away the larger units given", which means $dur->in_units('days') should be returning the total number of days, not a remainder of days. Yes, there is a documented caveat: "The last example demonstrates that there will not be any conversion between units which don't have a fixed conversion rate." While this caveat does explain why the current implementation happens to return a remainder of days, that doesn't justify relying on this misbehavior.

To really fix this API would probably be majorly backwards-incompatible , so should probably be a major version bump and big warnings in CHANGES and README.md etc ...

I'm fingerpainting that it could be phased

  • n+1: Carp if acting as a remainder with "deprecated, will be fatal in next release"
  • n+2: Croak when loss of data, as forewarned;
  • n+3: returns 41 not 10 or 11 or ... unless ('months', 'days') requested

(This could still seriously break an app that migrated from $n to $n+3 after many years of non-maintenance! Here's hoping they have regression tests )

This approach is predicated on the presumption that relying on $dur->in_units('days') to return a remainder is a reasonable thing for users to do, just because that's what it happens to return in the current implementation. My assertion is that it is not reasonable for a user to rely on this behavior, because it's contrary to the definition of this API call in the first place. Moreover, it's far more likely that the user actually expected the defined behavior of the API call (not a remainder), so maintaining bug-for-bug compatibility really isn't warranted here. What is warranted is to continue returning a number.

To minimize backwards-compatibility issues, my suggestion would be to alter the definition of this API call to return a "best effort" approximation in circumstances where a fixed conversion rate does not exist, and replace the current caveat with clear documentation of the exact approximation algorithm. For example, the conversion from years to days could be defined as 365.2425 days/year, taking leap days into account, and the conversion from days to seconds could defined as 86,400 seconds/day, explicitly ignoring leap seconds. All the other conversions are obvious: 12 months/year, 7 days/week, etc.

Fractional values of any unit (e.g. partial days) could be converted down into smaller units all the way down to nanoseconds and added to any duration values stored in those smaller units, and then the in_units() API call could either truncate all units smaller than the smallest one requested (which I'm guessing is probably what it does now), or perhaps to convert all smaller units back into a fractional value for the smallest unit actually requested and then explicitly round to the nearest integer.

Ideally, I would also suggest extending the internals of DateTime::Duration to allow the exact correct values to be returned instead of approximations, for durations created by subtracting DateTime objects. Approximations would still be necessary for durations explicitly constructed more generically such as DateTime::Duration->new(months => 1) where the particular dates in question are unknown, but the exact number of total days can always be calculated accurately when subtracting two specific known dates.

deven commented 2 years ago

On a separate note, the fact that DateTime::Duration->new(months => 1, days => 11)->days returns 4 instead of 11 seems counter-intuitive to me. Personally, I would've found it more intuitive if ->days returned 11 and something else like ->wdays returned 4. Obviously, the API is already established this way so it's too late to change this, but perhaps adding a new ->mdays method for abs(($dur->in_units('days', 'months'))[0]) (returning 11 in this case) might be useful?

autarch commented 2 years ago

Thanks for this bug report and the good discussion.

I don't think doing any sort of guesstimation of the return values (like a month is 30.5 days) is on the table. This seems like the most likely thing to break existing code, since it wouldn't even cause a warning, much less an exception.

If we want to make it possible to turn months into days, the right way to do this is by using a DateTime as a base for these calculations. The DateTime::Format::Duration module does this. It might be possible to add some new methods to DateTime::Duration around this. I think we'd want to construct a new type of Duration class, maybe called DateTime::Duration::WithBase:

my $dur = $dt1->delta_md($dt2)->with_base($dt1);

We could even consider adding a set of new methods to DateTime itself:

my $dur = $dt->delta_md_with_base($dt)

This Duration::WithBase could then return simple but correct answers for questions like "how many days does this duration represent?"

As far as other changes like making in_units carp or croak, I'm not too keen on this. There's a lot of code out there using DateTime. I suspect a lot of it is in semi-maintenance mode, where people might upgrade libraries but not pay a ton of attention to warnings coming from it.

Adding new methods and classes is much safer. It fixes the problem for people who care without any risk of breaking existing code.

autarch commented 2 years ago

Adding new methods and classes is much safer. It fixes the problem for people who care without any risk of breaking existing code.

I was thinking about this a bit more and there is one big question with adding a new Duration class. How do the DateTime methods that accept a DateTime::Duration work with this new Duration class?

I think this should just work as long as the new Duration class implements the deltas method and returns something reasonable for it. That said, it's a little weird to pass a Duration with a base DateTime to a method called on a different DateTime. But maybe that's ok?

n1vux commented 2 years ago

little weird to pass a Duration with a base DateTime to a method called on a different DateTime. But maybe that's ok?

That's what one would need to do if one needs to offset other DateTime's by the same amount, so not unreasonable to support.

The new alternative Duration class could either be duck-type compatible (supporting deltas) or you could create an abstract base class for both of them that has a must-implement commitment for deltas , to have an ISA relationship (instead of having to do can deltas for typechecking).

autarch commented 2 years ago

I think for max compatibility we'd want DateTime::Duration::WithBase to subclass DateTime::Duration. That way anything that checks ->isa('DateTime::Duration') will keep working.

deven commented 2 years ago

I don't think doing any sort of guesstimation of the return values (like a month is 30.5 days) is on the table. This seems like the most likely thing to break existing code, since it wouldn't even cause a warning, much less an exception.

You're talking about the risk of breaking existing already-broken code. How would this break the code any worse than it already is? Any code which would be affected by this change is already broken and this would be likely to make it less broken than it already is.

I understand that adding a warning after the fact is potentially detrimental. But why on earth would you want to continue returning 20 as a completely wrong answer just because you can't be certain whether 50 or 51 is the correct answer? Returning either 50 or 51, on any arbitrary basis, is clearly better than returning 20. Such an approximation may not be returning a guaranteed correct answer, but the existing implementation is already returning wrong answers! If the result is subject to potential error anyhow, why not at least minimize that potential error to a day or two, rather than an unlimited error range that could easily be off by months or years?

This Duration::WithBase could then return simple but correct answers for questions like "how many days does this duration represent?"

This is approximately what I had in mind, some sort of duration with knowledge of the actual date(s) involved so that it can return correct answers. Is there a reason it needs to be a different class though? Why not just have different behavior based on whether the reference date(s) have been set on the object?

As far as other changes like making in_units carp or croak, I'm not too keen on this. There's a lot of code out there using DateTime. I suspect a lot of it is in semi-maintenance mode, where people might upgrade libraries but not pay a ton of attention to warnings coming from it.

This is part of the reason I suggested the "best effort" approximation, because any existing code that would be affected is already broken but likely not being properly maintained, and the maintainers are probably not even aware of the issue. Replacing a completely wrong answer with a close approximation is likely to make such code work better, not worse. It's not a perfect solution, but it's a reasonable compromise.

Adding new methods and classes is much safer. It fixes the problem for people who care without any risk of breaking existing code.

What is the value of maintaining bug-for-bug compatibility for code which is already undeniably invalid code according to the existing documentation?

n1vux commented 2 years ago

sigh.

While i get your point that broken code is broken, it's possible someone is using the current broken API with understanding of what it actually does.

Changing the behavior of broken code in production that may have been worked-around is still frowned upon in IT shops.

This isn't like CPR where worrying about breaking ribs isn't a concern since broken ribs is better than dead. (And if you don't break ribs, you might not be doing it right.) Some script in a production job may well be still breathing despite this borked API.

So at a minimum it would need very loud INCOMPATIBLE CHANGE verbiage.

(Since DateTime::Duration is not a dual-life Core module, at least there isn't the dreaded possibility that an OS upgrade slips a new version of the module in with an updated Perl5, for those shops that foolishly use OS /usr/bin/perl for production jobs Noooooo.)

jhannah-mm commented 2 years ago

FWIW, at $dayjob the biggest Perl project I support uses DateTime 714 times, codebase written from 2009-present. Whatever it's been doing since 2009 is Correct(tm) from our perspective since we "worked around" any non-intuitive behavior years ago. Please don't stop us from continually updating perl and DateTime by introducing 714 "bugs" we would need to "fix." :)

autarch commented 2 years ago

I'm pretty sure I already wrote this in some other issue somewhere, but I'll say it again here ...

Sadly, I think that most of the Perl code that will ever be written already exists. So because of that, I think backwards compatibility has a much higher priority than fixing poorly designed APIs (note that fixing bugs is something else entirely). The situation was different 10-20 years ago, when the benefit of a backwards incompatible change could be enjoyed by more people than were upset at the breakage.

deven commented 2 years ago

While i get your point that broken code is broken, it's possible someone is using the current broken API with understanding of what it actually does.

Changing the behavior of broken code in production that may have been worked-around is still frowned upon in IT shops.

I understand where you're coming from, and this viewpoint makes sense in general.

In this particular instance, it seems incredibly unrealistic to me. Do you truly believe there's anyone out there who (1) is aware of the subtle nuance that in_units('days') is actually invalid code, (2) understands that this API call incorrectly returns a month remainder in days instead of a total number of days as documented, (3) wants this behavior for some reason, (4) doesn't want the number of months along with the remainder in days, and (5) intentionally puts code into production using this in_units('days') to get only the month remainder in days, despite understanding that in_units('months', 'days') is the correct way to get that remainder from the API if that's truly what they want? I'd love to meet this hypothetical person!

On the other hand, I opened this ticket because I encountered in_units('days') in real-world production code, written by someone unaware of that subtle nuance, where the intent and expectation was that it was returning the total number of days as documented, not a month remainder in days. Your example is hypothetical; this is an actual real-world example.

deven commented 2 years ago

FWIW, at $dayjob the biggest Perl project I support uses DateTime 714 times, codebase written from 2009-present. Whatever it's been doing since 2009 is Correct(tm) from our perspective since we "worked around" any non-intuitive behavior years ago. Please don't stop us from continually updating perl and DateTime by introducing 714 "bugs" we would need to "fix." :)

714 times? That number is very precise! I'm guessing you actually scanned the codebase to get that count, am I right?

Just for the sake of curiosity, if you wouldn't mind scanning the codebase again, it would be very interesting to hear (1) how many times your codebase actually calls the in_units() method, (2) how many of those calls have "days" as the only argument, and (3) if there are any, does the code actually expect a month remainder in days instead of a total number of days?

jhannah-mm commented 2 years ago

714 times? That number is very precise! I'm guessing you actually scanned the codebase to get that count, am I right?

Yup! ✗ ack DateTime | wc -l

Just for the sake of curiosity, if you wouldn't mind scanning the codebase again, it would be very interesting to hear (1) how many times your codebase actually calls the in_units() method

14

(2) how many of those calls have "days" as the only argument

2

and (3) if there are any, does the code actually expect a month remainder in days instead of a total number of days?

In one case it's doing this:

sub flatten_to_seconds {
  my ($duration) = @_;
  my $years       = $duration->in_units('years');
  my $days        = $duration->in_units('days');
  my $minutes     = $duration->in_units('minutes');
  my $nanoseconds = $duration->in_units('nanoseconds');
  # Ha. DateTime is technically correct, that this CAN'T be calculated.
  # It depends on leap seconds and all kinds of craziness.
  # But we really don't care about those errors for this report.
  # So squash a duration, close enough.
  my $seconds =
    $years * 365 * 24 * 60 * 60 +
    $days * 24 * 60 * 60 +
    $minutes * 60 +
    $nanoseconds / 1000 / 1000000;
  ...

Is that a case where my horrible code has always been broken and I never realized it?

In the other case someone else wrote this in 2012:

my $days_rem = ($dt_final_midnight->delta_days($dt_most_recent_midnight)->in_units("days"));

I don't know, off-hand, whether or not changing the behavior of in_units("days") would "break", "fix", or be irrelevant for that line of code. I'd have to figure out the possible inputs that flow into that in our environment.

deven commented 2 years ago

Sadly, I think that most of the Perl code that will ever be written already exists. So because of that, I think backwards compatibility has a much higher priority than fixing poorly designed APIs (note that fixing bugs is something else entirely). The situation was different 10-20 years ago, when the benefit of a backwards incompatible change could be enjoyed by more people than were upset at the breakage.

That's a rather depressing thought! Sadly, you're probably right.

As for fixing bugs, this is what the documentation for in_units() says:

$dur->in_units( ... )

Returns the length of the duration in the units (any of those that can be passed to DateTime::Duration->new) given as arguments. All lengths are integral, but may be negative. Smaller units are computed from what remains after taking away the larger units given, so for example:

my $dur = DateTime::Duration->new( years => 1, months => 15 );

$dur->in_units('years');                # 2
$dur->in_units('months');               # 27
$dur->in_units( 'years', 'months' );    # (2, 3)
$dur->in_units( 'weeks', 'days' );      # (0, 0) !

The last example demonstrates that there will not be any conversion between units which don't have a fixed conversion rate. The only conversions possible are:

  • years <=> months
  • weeks <=> days
  • hours <=> minutes
  • seconds <=> nanoseconds

For the explanation of why this is the case, please see the How DateTime Math Works section of the DateTime documentation

The method call in question is in_units('days'). The documentation says that this method "Returns the length of the duration in the units (any of those that can be passed to DateTime::Duration->new) given as arguments" and "Smaller units are computed from what remains after taking away the larger units given" -- there are no "larger units given" in this case, so there should be no remainder. The return value should be the total length of the duration in days. Instead, the current implementation returns a month remainder in days. How is this not a bug? The return value isn't what the documentation says it is!

Yes, there is a caveat about which conversions are possible, which warns the user not to rely on unsupported conversions. That doesn't make it any less of a bug that the method call is returning a remainder in violation of the documented return value!

If you don't want to return a best-effort approximation of the total number of days, then there's one more option we haven't considered. You could change the implementation to always return zero for each of the requested units when an "impossible" conversion is attempted. The last example in the documentation could even be viewed as implying that result already! While undef would be more logical, returning 0 would be as backwards-compatible as possible, since it's still returning a number of some sort, just like an approximation would be.

deven commented 2 years ago

In one case it's doing this:

sub flatten_to_seconds {
  my ($duration) = @_;
  my $years       = $duration->in_units('years');
  my $days        = $duration->in_units('days');
  my $minutes     = $duration->in_units('minutes');
  my $nanoseconds = $duration->in_units('nanoseconds');
  # Ha. DateTime is technically correct, that this CAN'T be calculated.
  # It depends on leap seconds and all kinds of craziness.
  # But we really don't care about those errors for this report.
  # So squash a duration, close enough.
  my $seconds =
    $years * 365 * 24 * 60 * 60 +
    $days * 24 * 60 * 60 +
    $minutes * 60 +
    $nanoseconds / 1000 / 1000000;
  ...

Thanks, Jay! This is an excellent real-world code example, and it proves that there is code in the wild which is relying on the current API behavior. This is valuable input to the discussion.

Sorry, guys! I was wrong, underestimating the likelihood of anyone relying on this bug. This isn't the unrealistic situation that I described, but it's a real-world example that does change my opinion on how to proceed. I was thinking of the API call being made in isolation, but the usage of sequential calls like this didn't occur to me. My bad!

Unfortunately, it seems that any change in the behavior of this API call will indeed impact some of the existing code out there, and precautions like an INCOMPATIBLE CHANGE warning or a major version number change seem warranted. Perhaps it's not worth the hassle and potential disruption, especially given Dave's point about the sad state of Perl programming these days...

This is hardly ideal, but in lieu of fixing the current behavior of this API call to match the documentation, perhaps the best alternative would be to retcon the situation by changing the documentation to match the implementation instead. It would make the API a bit strange, but there would be no backwards compatibility risk at all. (Adding a new API call that works better might be worth considering too.)

Is that a case where my horrible code has always been broken and I never realized it?

Sorry, I'm afraid your code above has always been broken. While it does work fine for many values, it fails badly for many more.

You're asking the API for years, days, minutes and nanoseconds. The correct way to use this API call would be to ask for all of those units at once, so it knows the larger units to properly calculate the remainders for the smaller units. For example:

my ($years, $days, $minutes, $nanoseconds) = $duration->in_units(qw[years days minutes nanoseconds]);

However, because each of these units happens to be in a separate group, you're actually getting the same results back despite using the API call incorrectly, because of how limited the conversions are.

Look at the list of possible conversions listed in the documentation:

  • years <=> months
  • weeks <=> days
  • hours <=> minutes
  • seconds <=> nanoseconds

This is actually four independent groups of units that will convert values within a group, but not between groups. At first glance, you might think that "days <=> hours" and "minutes <=> seconds" are also fixed conversion rates, but technically they're not due to Daylight Saving Time and leap seconds.

Why is your code broken? You never asked the API for months. While there is a conversion rate between months and years, the API won't return fractional values, so you're losing the months entirely! (If you had asked for months instead of years, the years would've been added to the months and you'd be fine.)

I tested your code, and here's some examples. Take particular note of the 1-month example and the final example from subtracting two different DateTime objects (not strings):

(1 second) = 1 second
(1 minute) = 60 seconds
(60 minutes) = 3600 seconds
(1 day) = 86400 seconds
(7 days) = 604800 seconds
(1 month) = 0 seconds
(12 months) = 31536000 seconds

$date1 = 2022-08-27T00:00:00
$date2 = 2022-06-26T00:00:00
flatten_to_seconds($date1 - $date2) = (2 months, 1 day) = 86400 seconds

This final example returns the number of seconds in a day, completely omitting the other 2 months.

In the other case someone else wrote this in 2012:

my $days_rem = ($dt_final_midnight->delta_days($dt_most_recent_midnight)->in_units("days"));

This code is actually fine, because delta_days() specifically creates a duration that's entirely in days, not months. I added this comparison example to my test script and you can see the results are correct when using delta_days():

$date1 = 2022-08-27T00:00:00
$date2 = 2022-06-26T00:00:00
flatten_to_seconds($date1 - $date2) = (2 months, 1 day) = 86400 seconds
flatten_to_seconds($date1->delta_days($date2)) = (62 days) = 5356800 seconds

I don't know, off-hand, whether or not changing the behavior of in_units("days") would "break", "fix", or be irrelevant for that line of code. I'd have to figure out the possible inputs that flow into that in our environment.

None of the proposed changes would affect your delta_days example, but they could make your first code example much more broken than it already is. That's why I'm suggesting retconning the API docs instead.

jhannah-mm commented 2 years ago

Oh well good to know my code has always been broken AND it stopped others from fixing other broken code. :) Thanks for the run-through, I'll throw some time against it this coming week.

deven commented 2 years ago

Oh well good to know my code has always been broken AND it stopped others from fixing other broken code. :) Thanks for the run-through, I'll throw some time against it this coming week.

If you’re wanting to fix that function, you could try something like this if you want:

sub flatten_to_seconds {
  my ($duration) = @_;
  my ($months, $days, $minutes, $nanoseconds) = $duration->in_units(qw[months days minutes nanoseconds]);
  # Ha. DateTime is technically correct, that this CAN'T be calculated.
  # It depends on leap seconds and all kinds of craziness.
  # But we really don't care about those errors for this report.
  # So squash a duration, close enough.
  my $seconds = ($months / 12 * 365.2425 + $days) * 86400 + $minutes * 60 + $nanoseconds / 1000000000;
 …

I believe this should work correctly, and it uses the API the way it’s meant to be used.