spatie / period

Complex period comparisons
https://spatie.be/open-source
MIT License
1.59k stars 72 forks source link

Including or excluding boundaries #9

Closed brendt closed 5 years ago

brendt commented 5 years ago

Right now, the boundaries of a period are always included in calculations. We should have a discussion about what the desired default behaviour should be, whether we want to make this behaviour configurable, and how.

brendt commented 5 years ago

Including both boundaries

Excluding the end boundary

$dateA = new DateTimeImmutable('2018-01-01');
$dateB = new DateTimeImmutable('2018-02-01');
$dateC = new DateTimeImmutable('2018-03-01');

$january = Period::make($dateA, $dateB);
$february = Period::make($dateB, $dateC);

Configuration

We could make this configurable. But it should be in an easy to use way. The following seems rather verbose to write over and over again:

$period = Period::make(/* … */)->excludeEnd();

We'd rather have some kind of user-configurable default, but this implies that a factory for periods should be added. That in itself is also a rather verbose solution. We could rely on static state when constructing a period, but I don't feel like that's a clean solution.

Input is appreciated.

IwfY commented 5 years ago

If one sees this library as primary about dates and time is more or less an afterthought then the include boundaries option makes most sense to me.

If you think about time sensitivity the exclusion of end dates makes sense if you want to have sequential periods. Like one period starts at 11:30:00 and the second starts at 12:30:00. If you include boundaries and want not to have a gap between the two you will have to set fractions of seconds (e.g. first period ends at 12:29:59.999999). You are still not safe and have to consider floating point rounding and inaccuracy.

This variable boundaries issue will effect many other methods so I think it's up to the maintainer(s) to decide if they can afford to put this in.

nyamsprod commented 5 years ago

@brendt If I can help you here's the reason why in League\Period I exclude the ending datepoint. Also adding the ending datepoint is a possibility but then it all depend on your public API and how users will be able to easily detect and manage the range inclusivity of your object.

As far as I've seen in the wild... both decisions have consequences for the complexity or not of your library

brendt commented 5 years ago

Thanks for the input both @IwfY and @nyamsprod, really appreciate it! We've also been discussing this at the office in depth.

I agree there's benefit to both [] and [) intervals. My goal with this package is to reduce a developer's cognitive load when working with periods. I never want them having to stop and think "hang on, how should I do this in order for it to work correctly".

That's the reason I decided to always use DateTimeImmutable, there's 0% chance your input will be changed.

I'm leaning towards supporting both, and maybe even also supporting (] and () intervals.

zlikavac32 commented 5 years ago

@brendt Hi,

I'd like to share my comment.

If we think of time as being defined in a discrete manner, we can apply some set theory rules and say that overlapping two periods, means intersecting all discrete elements in that set (every discrete element). What comes out of this, is that overlapping is always defined in the same way and what changes is how we define our set. Currently it's defined with microsecond precision and with excluding first and last element.

But user of this library should be responsible for granularity on which it operates and how he defines his set. Different applications benefit from different definitions.

When defining vacations, we'd like to define them as both inclusive, but for example, bookings, we could have them defined as inclusive/exclusive because that helps with overlapping in that context.

But no matter how we define them, overlapping itself has the same semantics. So if the library user would also be responsible for defining granularity, there should not be any issue.

That also helps their domain model, for example:

<?php

declare(strict_types=1);

abstract class Period
{

    /**
     * @var DateTimeImmutable
     */
    private $start;
    /**
     * @var DateTimeImmutable
     */
    private $end;

    public function __construct(DateTimeImmutable $start, DateTimeImmutable $end)
    {
        $this->start = $start;
        $this->end = $end;
    }

    public function overlap(Period $period): Period
    {
        $start = max($this->start, $period->start);
        $end = min($this->end, $period->end);

        if ($start > $end) {
            throw new LogicException('No overlap');
        }

        return $this->prototype($start, $end);
    }

    public function length(): DateInterval {
        return $this->start->diff($this->end);
    }

    protected function prototype(DateTimeImmutable $start, DateTimeImmutable $end): self {
        return new static($start, $end);
    }
}

class VacationPeriod extends Period {

}

class BookingPeriod extends Period {

    public function __construct(DateTimeImmutable $start, DateTimeImmutable $end)
    {
        // check that it's rounded to a day
        parent::__construct($start, $end->sub(new DateInterval('P1D'))); // think of it as removing last element from the set
    }

    protected function prototype(DateTimeImmutable $start, DateTimeImmutable $end): Period
    {
        return new VacationPeriod($start, $end->add(new DateInterval('P1D'))); // think of it as adding last element to the set
    }
}

$vacationA = new VacationPeriod(new DateTimeImmutable('2018-01-02'), new DateTimeImmutable('2018-01-09'));
$vacationB = new VacationPeriod(new DateTimeImmutable('2018-01-05'), new DateTimeImmutable('2018-01-14'));

var_dump($vacationA->overlap($vacationB));

$bookingA = new BookingPeriod(new DateTimeImmutable('2018-01-02'), new DateTimeImmutable('2018-01-09'));
$bookingB = new BookingPeriod(new DateTimeImmutable('2018-01-05'), new DateTimeImmutable('2018-01-14'));
$bookingC = new BookingPeriod(new DateTimeImmutable('2018-01-08'), new DateTimeImmutable('2018-01-14'));

var_dump($bookingA->overlap($bookingB));
var_dump($bookingA->overlap($bookingC));

Now, there are all kinds of variations to this, and what defaults the library offers (for example one for each combination with third argument being interval), but this could perhaps be one way of thinking. At least that's how I think of periods, I also need measuring unit info.

brendt commented 5 years ago

Thanks for all the input everyone!

I wanted to drop this poll I did on Twitter here: https://twitter.com/brendt_gd/status/1068507386149777408

I found especially this answer useful: https://twitter.com/hruske/status/1069511313011953664

My conclusion right now: it depends.

I think the only right thing to do is support all options, make it configurable and easy to use. The next question is how we can make this is as intuitive as possible.

MrWeb commented 5 years ago

My opinion.

The start has to be inclusive of course as you don't think of a range saying "How much did I make from the 1st of Jan" and actually go by selecting "31st Dec", right?

While the end should be inclusive by default but with that ->excludeEnd(); flag and this is because a filter could have a button for "Last Month" or "March" and that has to go from the first day of that month to its last, both inclusive. And the ->excludeEnd() would be very much appreciated in many cases, cannot see them yet, but yeah.

nyamsprod commented 5 years ago

While the end should be inclusive by default but with that ->excludeEnd(); flag and this is because a filter could have a button for "Last Month" or "March" and that has to go from the first day of that month to its last, both inclusive. And the ->excludeEnd() would be very much appreciated in many cases, cannot see them yet, but yeah.

I would go the other way around excluding by default and adding a marker to make the period inclusive. As pointed in the twitter thread PostgreSQL by default and most reporting accounting tools by default uses a left close, right open interval (ie exclude the ending datepoint) and also it is easier to create an inclusive interval from and exclusive one than the other way around because of date time precision.

Bottom line whichever road you choose making this easier for the developer to choose will be the crucial part of your library πŸ˜‰

my 2 cents.

shalvah commented 5 years ago

Would it help if you added convenience methods like:

Period::from($start)->toEndOf($end);
Period::from($start)->toStartOf($end);
Period::from($start)->until($end);
Period::from($start)->upTo($end);

Or something similar?

brendt commented 5 years ago

@shalvah the problem there is that ::from constructs a period, even though no end date was given yet.

We could let ::from create another object, not related to period, which in its turn has the toEndOf and other methods. That's something to consider πŸ€”

nyamsprod commented 5 years ago

@shalvah @brendt the issue is datetime precision quoting a tweet from @brendt

https://twitter.com/brendt_gd/status/1069508338273128449

If you and I both stay at the same hotel, I from 2019-01-01 until 2019-01-05, and you from 2019-01-05 until 2019-01-10. There is a chance of us meeting, at 2019-01-05; so why is it better to say these periods don't overlap?

With a left-closed right open period definition the answer is clear .. both client won't meet, end of the story.

With a left and right closed interval, you are required to precise your endpoint does until 2019-01-05 means until 2019-01-05 at midnight, a 12 AM or do you include the full day ? What if we are now talking about hours span or year span ? How does your library manage this precision? On the fly or should it be explicitly set ? In both cases, what are the rules (preferably using a RFC type document to back up your decision) ?

That's the main reason most library chooses left-closed right-open interval to avoid these issues altogether.

Once you've figure how to manage in a meaningful way your interval boundaries then you can implement a solution and the API becomes the least of your issue.

@brendt Also you're not the first to tackle this issue you should look at how Jodatime in Java and TimePeriod using .NET solved this issue... if they even try to. This should give you some references on how to deal with all type of intervals.

brendt commented 5 years ago

@nyamsprod I get why you say the following:

With a left-closed right open period definition the answer is clear .. both client won't meet, end of the story.

But we shouldn't forget that that answer is wrong! πŸ˜…Given the question of the clients: they are able to meet on 2019-01-05.

I've been reading and talking with people about this, and I've come to the conclusion that there is no correct answer. In some use cases, inclusive boundaries make sense, in others they don't.

It's clear to me that this library should support both approaches and that both should never allow for any confusion. That's the main goal of this package: ease of use, so that developers shouldn't worry about "am I doing the right thing?".

nyamsprod commented 5 years ago

@brendt the answer is neither wrong or right, the answer depends on the context. The endpoint definition represents that context so I disagree. Also your initial question is too ambiguous from a scientific point of view. Depending on your clients cultural backgrounds your question has many answers. For instance:

Bottom line:

They may meet but from reading your sentence there is no guarantee at all that they will without more context.

What the clients says and what your program except are two distinct things and your library needs to translate the client inputs into logical et reproducible actions before proceeding.

By saying that my answer is wrong means that you've already made some assumptions without evidence that those assumptions are corrects. On the other hand before giving my answer I've first define the rule used to interpret the given interval before stating my answer. πŸ˜‰

zlikavac32 commented 5 years ago

@brendt @nyamsprod As I mentioned earlier, I don't think the issue is with definition of boundaries, but rather the definition of the set, unless you do want to think of time in a continuous terms.

If we think of time in discrete terms, it's easy to implement period. Here is a quick example inspired by PostgreSQL with a limited set of functionality that uses day as one discrete unit.

<?php

declare(strict_types=1);

class Period
{

    /**
     * @var DateTimeImmutable
     */
    private $start;
    /**
     * @var DateTimeImmutable
     */
    private $end;
    /**
     * @var string
     */
    private $boundaries;
    /**
     * @var DateTimeImmutable
     */
    private $realStart;
    /**
     * @var DateTimeImmutable
     */
    private $realEnd;
    /**
     * @var DateInterval
     */
    private $discreteUnit;

    public function __construct(DateTimeImmutable $start, DateTimeImmutable $end, string $boundaries = '[]')
    {
        $this->assertValidBoundaries($boundaries);

        // assume unit day for now, but can be generalized

        $this->start = $start;
        $this->end = $end;
        $this->boundaries = $boundaries;

        // @todo: assert that start/end are rounded on an day
        $this->discreteUnit = new DateInterval('P1D');

        // internally track closed intervals
        $this->realStart = $boundaries[0] === '[' ? $start : $start->add($this->discreteUnit);
        $this->realEnd = $boundaries[1] === ']' ? $end : $end->sub($this->discreteUnit);
    }

    public function overlap(Period $period): Period
    {
        $realStart = max($this->realStart, $period->realStart);
        $realEnd = min($this->realEnd, $period->realEnd);

        if ($realStart > $realEnd) {
            throw new LogicException('No overlap');
        }

        // For now, respect this interval definition
        $start = $this->boundaries[0] === '[' ? $realStart : $realStart->sub($this->discreteUnit);
        $end = $this->boundaries[1] === ']' ? $realEnd : $realEnd->add($this->discreteUnit);

        return new Period($start, $end, $this->boundaries);
    }

    public function contains(DateTimeImmutable $member): bool
    {
        return $member <= $this->realEnd && $member >= $this->realStart;
    }

    private function assertValidBoundaries(string $boundaries): void
    {
        if (!in_array($boundaries, ['[]', '[>', '<]', '<>'])) {
            throw new LogicException(sprintf('Boundaries %s is not valid', $boundaries));
        }
    }

    public function __toString(): string
    {
        return sprintf('%s%s, %s%s', $this->boundaries[0], $this->start->format('Y-m-d'), $this->end->format('Y-m-d'),
            $this->boundaries[1]);
    }
}

function runOverlap(Period $first, Period $second): void
{
    printf("%s ∩ %s = %s\n", $first, $second, $first->overlap($second));
}

function runContains(DateTimeImmutable $member, Period $period): void
{
    printf("%s in %s = %s\n", $member->format('Y-m-d'), $period, $period->contains($member) ? 'Yes' : 'No');
}

$period1 = new Period(
    new DateTimeImmutable('2018-01-01'),
    new DateTimeImmutable('2018-01-12')
);
$period2 = new Period(
    new DateTimeImmutable('2018-01-08'),
    new DateTimeImmutable('2018-01-18'),
    '<]'
);
$period3 = new Period(
    new DateTimeImmutable('2018-01-08'),
    new DateTimeImmutable('2018-01-12'),
    '<>'
);

runContains(new DateTimeImmutable('2018-01-01'), $period1);
runContains(new DateTimeImmutable('2018-01-05'), $period1);
runContains(new DateTimeImmutable('2018-01-12'), $period1);
runContains(new DateTimeImmutable('2018-01-12'), $period3);

echo "\n";

runOverlap($period1, $period2);
runOverlap($period2, $period1);
runOverlap($period3, $period1);

Produces

2018-01-01 in [2018-01-01, 2018-01-12] = Yes
2018-01-05 in [2018-01-01, 2018-01-12] = Yes
2018-01-12 in [2018-01-01, 2018-01-12] = Yes
2018-01-12 in <2018-01-08, 2018-01-12> = No

[2018-01-01, 2018-01-12] ∩ <2018-01-08, 2018-01-18] = [2018-01-09, 2018-01-12]
<2018-01-08, 2018-01-18] ∩ [2018-01-01, 2018-01-12] = <2018-01-08, 2018-01-12]
<2018-01-08, 2018-01-12> ∩ [2018-01-01, 2018-01-12] = <2018-01-08, 2018-01-12>

I think PostgreSQL work with days and seconds.

nyamsprod commented 5 years ago

@zlikavac32 at first glance your implementation seems to be in accordance with what I try to convey :+1: ie to determine how interval interact with each other you first need to defined the endpoints type (unbounded, closed or open) first then you may determine their behaviour. The only part where a discussion should be made is how and when the $discreteUnit property is seeded.

brendt commented 5 years ago

I feel safe to say that we all agree, sorry if I didn't make myself clear, guess I was caught in the moment.

@zlikavac32 I also like your example, it's clear that Period should be configurable to work exactly as the user want it to work.

It's my understanding that this package now lacks 2 things, please correct me if I'm wrong:

Implementing these is pretty straight forward, and I will do so. What I'm not certain of is how they should be configured. Could you give your feedback on this?

brendt commented 5 years ago

Here are some of my personal thoughts on the syntax question:

I'm afraid $boundaries = [] is a little cryptic for some users. Remember that not everyone using this package is familiar with this interval notation. I like the cleanliness though, so we might add it as a possibility, just not the only one.

Coming from a Laravel background myself, and having done some research on how code is read, I feel like a more eloquent API should also be provided.

Something like Period::make()->endExcluded()->startExcluded() or Period::make()->endIncluded()->startIncluded().

We could also add other static constructors like Period::makeInclusive(), Period::makeHalfOpen(), etc.

I also liked the idea suggested before: Period::from()->until() and Period::from()->to(), thought maybe that's a little overkill.

Regarding the discrete unit: I think the default should be 1 day, but it should also be configurable, probably via the constructor and make calls, but also via a fluent API.

nyamsprod commented 5 years ago

I'm afraid $boundaries = [] is a little cryptic for some users.

There are not thousand types ... you could define them using constants and be done with it.

I feel like a more eloquent API should also be provided.

Fluent interfaces are good for Builder or Factory objects which is not what Period is. Maybe you should offload Period creation to such object and make the constructor signature more obvious for people who do not come from a Laravel background. You get the best of both world like that.

<?php

interface RangeInclusivity
{
    public const LEFT_CLOSED_RIGHT_OPENED = '[)';
    public const LEFT_CLOSED_RIGHT_CLOSED = '[]';
    public const LEFT_OPENED_RIGHT_CLOSED = '(]';
    public const LEFT_OPENED_RIGHT_OPENED = '()';
}

class Period implements RangeInclusivity
{
    public function __construct(
        DateTimeImmutable $startDate,
        DateTimeImmutable $endDate,
        DateInterval $discretUnit,
        string $boundaries = self::LEFT_CLOSE_RIGHT_OPEN

    ) {
        ....
    }
}

$builder = new PeriodBuilder();
$period = $builder->build()->endIncluded()->startIncluded(); // and so on
brendt commented 5 years ago

Thanks for the input @nyamsprod!

zlikavac32 commented 5 years ago

@brendt @nyamsprod Honestly, I think that using anything more than '[)' (my mistake, I used <> in examples above) is to verbose. If things are properly documented and concepts from "general knowledge" are used (and I think we all used in math notation that resembles this), people should note have a hard time using it.

I know that I'm happy when I don't have to write to much code and when things are easy to read.

Using builder to build period seems like to much hassle. There should be written a lot of code, and thinks may get tricky with call order.

If we consider units like '1 hour and 2 minutes' or '5 days' invalid, we could define supported units 'second', 'minute', ..., 'year' and use .

One approach could then be to expose that argument through the constructor and use enum to define possible values.

Other approach could be to have one class like DayPeriod, etc for each unit. That could have better semantics when used, since, as we all agreed, period is defined with it's unit. With first approach, one could forget to check for unit and just check boundaries, which could mean that period of 7 days is period of one week.

nyamsprod commented 5 years ago

@zlikavac32 using constant is a proven good practice you'll find in other domain object for instance HTTP status code are often turned into constant for ease. That's does not mean that the average developer does not know what 302 or 404 means but using a expressive constant makes the code more readable. Last but not least in my POC, the constants just mirror the established convention I did not intend to add or create something new that does not exist πŸ˜‰

Also after some thoughts I still contain that adding a $discretUnit to the class should not be included because it's at best an implementation detail (should be internal to the class and never exposed to the public API) at worst it has nothing to do with your domain logic.
Going back to @brendt tweet with the booking system to determine if those clients will or will not meet depends not only on the interval (covered by the inclusivity types) but more importantly on the hotel check-in/checkout rules. If checkout is due at 10 AM and the other client arrives at 3 PM there's a good chance they won't meet regardless of the period boundaries. The $discretUnit is just another way to round the period boundaries according to business rules and thus should be independent of your Period definition otherwise your are tightly coupling 2 concepts into one domain.

By removing the $discretUnit variable, you get ride of the builder class and you covers all scenario as the rounding is now the responsibility of your business logic and not the Period class anymore.

sebastiandedeyne commented 5 years ago
class Period implements RangeInclusivity
{
    public function __construct(
        DateTimeImmutable $startDate,
        DateTimeImmutable $endDate,
        DateInterval $discreteUnit,
        string $boundaries = self::LEFT_CLOSE_RIGHT_OPEN

    ) {
        ....
    }
}

Should a range value object have knowledge of how it's going to be compared? Shouldn't the $boundaries be determined during the comparison, not before?

$a->overlaps($b, Period::LEFT_CLOSE_RIGHT_OPEN);
zlikavac32 commented 5 years ago

@nyamsprod Yes, you are right. At first I thought it would be like 1, 2, ..., 4. With [) both options are available.

As for other things regarding discrete unit and boundaries. It all depends in which way this library wants to go and what are the defaults :).

@sebastiandedeyne Wouldn't it be easy to misuse this? For example, someone uses wrong type of boundary and introduces subtle bug? Also, do you apply this on both intervals?

brendt commented 5 years ago

I believe @sebastiandedeyne is saying that the open/closed flag is applied to the comparison, instead of one or two periods.

@zlikavac32 If someone uses the wrong boundary during a comparison, what's stopping them from using the wrong boundary when creating the period.

It's my opinion that applying the boundary to the comparison, instead of two periods, will be much more clear for developers.

This approach doesn't allow comparing two periods with different boundary types, but I think that this is the point where we should ask ourselves if supporting those edge-cases is within the scope if this library.

assertchris commented 5 years ago

I like the flags to indicate inclusion/exclusion; but I think we can avoid the mathematical notation altogether. What about a set of flags that say exactly what the intention should be? INCLUDE_START_INCLUDE_END, INCLUDE_START_EXCLUDE_END etc.

sebastiandedeyne commented 5 years ago

Changed my mind about the boundaries on comparison thing for two reasons:

zlikavac32 commented 5 years ago

@brendt Once the period is created, boundaries stay the same. If you used wrong boundary, yes, it's wrong, but you can easily "enforce" that by having domain class which uses correct boundary.

Assumption is also that there are more places where you overlap than there are places where you create period.

Also, what about consecutive overlaps? How is the period defined then? How would you propagate information that newly generated period is closed on one side?

@sebastiandedeyne Yes, boundary is the property of the set. Operations like union/intersect that work on sets don't care about boundary

brendt commented 5 years ago

Okay, good points! Boundaries stay on the period!

I'll try to submit a PR next week for you to review.

brendt commented 5 years ago

PR with WIP implementation: https://github.com/spatie/period/pull/11 let's continue the discussion there.

brendt commented 5 years ago

Boundaries are now supported: https://github.com/spatie/period/releases/tag/0.5.0