wanasit / chrono

A natural language date parser in Javascript
MIT License
4.5k stars 340 forks source link

Incorrect Recognition of "in [month]" pattern #550

Open ilDon opened 3 months ago

ilDon commented 3 months ago

The parser fails to correctly process date strings prefixed with "in" and followed by the short name of months (e.g., "in Jun"). While the recognition strips the "in" and identifies the month (e.g., "Jun"), the resulting text does not retain "in", causing the standalone month not to be recognized as a date when parsed separately.

Environment

Steps to Reproduce

  1. Configure a new chrono instance with default settings:

    const configuration = chrono.casual.defaultConfig.createCasualConfiguration(false);
    const chronoInstance = new chrono.Chrono(configuration);
    const forwardFrom = new Date();
  2. Parse the string "in Jun" with the following settings:

    const result = chronoInstance.parse('in jun', {
      forwardDate: true,
      startDayHour: 8
    });

    Expected result: result[0].text should be "in jun". Actual result: result[0].text is "jun".

  3. Repeat the parsing without the "in" prefix:

    const result = chronoInstance.parse('jun', {
      forwardDate: true,
      startDayHour: 8
    });

    Expected result: Date recognition for "jun". Actual result: No date is returned.

Expected Behavior

The parser should retain the "in" prefix in the recognized text because its removal results in the standalone month not being recognized as a date. Ideally, both "in Jun" and "Jun" should be correctly parsed as dates with the context retained when necessary.

Actual Behavior

The parser outputs "Jun" instead of "in Jun" when parsing "in Jun", and subsequently fails to recognize "Jun" as a valid date in the absence of the "in" prefix.

Proposed Fix

Retain the "in" prefix when needed to subsequently recognize the text.

wanasit commented 3 months ago

Hello. Thanks for reporting this.

I do not quite agree with the expected behavior.

The parser should retain the "in" prefix in the recognized text because its removal results in the standalone month not being recognized as a date.

We do not have a precise definition what make up of the result text (and its index location).

What has been the case so far is: text makes up phases the date/time are parsed/extracted from. It does not have to include other "clue" or "context" phases to explain why we think the text is date/time.

In your example, the word "in" (from "in jun") is only a clue. It is not part of the date/time.

Ideally, both "in Jun" and "Jun" should be correctly parsed as dates with the context retained when necessary

I don't think I agree with that assumption.

Unless you know what is the input domain/context, "Jun" or "jun" is not always "June". (If anything, I am also not sure if "in jun" should be recognized as June either).

--

Could you share the use-cases where this current behavior is inconvenient for you?