wanasit / chrono

A natural language date parser in Javascript
MIT License
4.6k stars 341 forks source link

New: Introduce Parsing Tags #534

Closed wanasit closed 1 year ago

wanasit commented 1 year ago

Parsing Tags

A parsing tag or just "tag" is a string attached to parsed components (and results) by parsers and refiners for additional information about the location and context where components were created or modified.

const custom = chrono.casual.clone();
custom.parsers.push({
    pattern: () => { return /\bChristmas\b/i },
    extract: (context, match) => {
        return .createParsingComponents({day: 25, month: 12 })
                .addTag("USHoliday/chirstmas");
    }
});
custom.parsers.push({
    pattern: () => { return /\bNew Year\b/i },
    extract: (context, match) => {
        return .createParsingComponents({day: 1, month: 1 })
                .addTag("USHoliday/new_year");
    }
});

....

const result = custom.parse("I'll arrive at 2.30AM on Christmas night....")[0];

// To recognize the result from a specific parser... 
result.tags().has("CustomHoliday/chirstmas")

// To recognize the result from a specific parser group... 
Array.from(result.tags()).some((t) => t.startsWith("USHoliday/")
Array.from(result.tags()).some((t) => t.match(/EN\w+Parser/)

// To recognize the special content/info from paser
const theHolidayName = Array.from(result.tags())
    .find(t => t.startsWith("USHoliday/"))
   ?.replace("USHoliday/", "");

Potential Usage

Debugging Extraction:

Inside Chrono, multiple parsers and refiners work together to produce the result. Identifying which component(s) output the incorrect results is challenging.

By tagging the result with involved parsers and refiners, it should be easier to identify the area to fix from the bug report or debug log:

Error: Expected date to be: Sat Sep 02 2023 14:29:21 GMT+0900 (Japan Standard Time) Received: Sun Sep 03 2023 14:29:21 GMT+0900 (Japan Standard Time) ([ParsingComponents {
            tags: ["ENCasualDateParser/extract/tomorrow"], 
            knownValues: {"day":3,"month":9,"year":2023}, 
            impliedValues: {"hour":14,"minute":29,"second":21,"millisecond":371}}, 
            reference: {"instant":"2023-09-02T05:29:21.371Z"}])

Recognizing results "types" (or "categories")

These is the list of example ideas:

wanasit commented 1 year ago

One limitation is that the parsers don't provide the translated casual reference. Translation would need to be done in user-land.

Yes. I'm afraid that is the case.

Providing translation or date formatting (back into localized human-readable) is not want I am willing to support inside Chrono for now. I hope there are other libraries that are more specialized in doing that, or you have to add that on your application side.

That said, after you try adding the translation to your application, if you still think it is more fitting to support that feature inside Chrono. I'm open to the discussion.

Another option would be to turn tags into an object instead of an array.

I am also considering having a value associated with the tag (aka. defining tags as Map, not Set), but that feels it could get over-complicated. If different tags from different parsers/refiners can have different values, that would be difficult to keep track of.


Let me try a few more ideas and fix the issues. I plan to release the feature next week.

zhouzi commented 1 year ago

Agreed.

Honestly I think tags as currently implemented are already enough (I think the naming inconsistencies discussed in the comments need to be adressed though).

Thanks for the discussion and your help 🙏