tc39 / proposal-intl-segmenter-v2

Version 2 of Intl Segmenter. Adding line break support.
https://tc39.github.io/proposal-intl-segmenter-v2/
MIT License
12 stars 4 forks source link
ecma402 intl javascript textboundary uax14

Intl Segmenter v2 Proposal

Intl.Segmenter v2: Unicode segmentation in JavaScript

A repository template for ECMA402 proposals.

You can browse the ecmarkup output or browse the source.

Stage 1

Proposed Spec Text (in green) as diff of Intl.Segmenter

Stage Advancement Slide

Staff

Motivation

This proposal aim to improve the Intl.Segmenter API with:

Use Cases

Batch Mode

Note: The "Batch Mode" is removed from the proposal after careful discussion with some intended users.

Line Break Granularity

Examples

d8> s = new Intl.Segmenter("en", {granularity: "line"})
[object Intl.Segmenter]
d8> ss = s.segment("飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動")
{}
d8> for (s of ss) { print(JSON.stringify(s)) }
{"segment":"飛","index":0,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"虎","index":1,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"隊","index":2,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"正","index":3,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"式","index":4,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"名","index":5,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"稱","index":6,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"為","index":7,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"「中","index":8,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"華","index":10,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"民","index":11,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"國","index":12,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"空","index":13,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"軍","index":14,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"美","index":15,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"籍","index":16,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"志","index":17,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"願","index":18,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"大","index":19,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"隊」,\n","index":20,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":true}
{"segment":"1940 ","index":24,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"年","index":29,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"代","index":30,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"早","index":31,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"期","index":32,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"在","index":33,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"緬","index":34,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"甸","index":35,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"展","index":36,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"開","index":37,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"行","index":38,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
{"segment":"動","index":39,"input":"飛虎隊正式名稱為「中華民國空軍美籍志願大隊」,\n1940 年代早期在緬甸展開行動","isHardBreak":false}
d8> ss = s.segment("The Flying Tigers is officially called the \"Republic of China Air Force American Volunteer Brigade\",\nand it launched operations in Myanmar in the early 1940s")
{}
d8> for (s of ss) { print(JSON.stringify(s)) }
{"segment":"The ","index":0,"input":"The Flying Tigers is officially called the \"Republic of China Air Force American Volunteer Brigade\",\nand it launched operations in Myanmar in the early 1940s","isHardBreak":false}
{"segment":"Flying ","index":4,"input":"The Flying Tigers is officially called the \"Republic of China Air Force American Volunteer Brigade\",\nand it launched operations in Myanmar in the early 1940s","isHardBreak":false}
{"segment":"Tigers ","index":11,"input":"The Flying Tigers is officially called the \"Republic of China Air Force American Volunteer Brigade\",\nand it launched operations in Myanmar in the early 1940s","isHardBreak":false}
{"segment":"is ","index":18,"input":"The Flying Tigers is officially called the \"Republic of China Air Force American Volunteer Brigade\",\nand it launched operations in Myanmar in the early 1940s","isHardBreak":false}
{"segment":"officially ","index":21,"input":"The Flying Tigers is officially called the \"Republic of China Air Force American Volunteer Brigade\",\nand it launched operations in Myanmar in the early 1940s","isHardBreak":false}
{"segment":"called ","index":32,"input":"The Flying Tigers is officially called the \"Republic of China Air Force American Volunteer Brigade\",\nand it launched operations in Myanmar in the early 1940s","isHardBreak":false}
{"segment":"the ","index":39,"input":"The Flying Tigers is officially called the \"Republic of China Air Force American Volunteer Brigade\",\nand it launched operations in Myanmar in the early 1940s","isHardBreak":false}
{"segment":"\"Republic ","index":43,"input":"The Flying Tigers is officially called the \"Republic of China Air Force American Volunteer Brigade\",\nand it launched operations in Myanmar in the early 1940s","isHardBreak":false}
{"segment":"of ","index":53,"input":"The Flying Tigers is officially called the \"Republic of China Air Force American Volunteer Brigade\",\nand it launched operations in Myanmar in the early 1940s","isHardBreak":false}
{"segment":"China ","index":56,"input":"The Flying Tigers is officially called the \"Republic of China Air Force American Volunteer Brigade\",\nand it launched operations in Myanmar in the early 1940s","isHardBreak":false}
{"segment":"Air ","index":62,"input":"The Flying Tigers is officially called the \"Republic of China Air Force American Volunteer Brigade\",\nand it launched operations in Myanmar in the early 1940s","isHardBreak":false}
{"segment":"Force ","index":66,"input":"The Flying Tigers is officially called the \"Republic of China Air Force American Volunteer Brigade\",\nand it launched operations in Myanmar in the early 1940s","isHardBreak":false}
{"segment":"American ","index":72,"input":"The Flying Tigers is officially called the \"Republic of China Air Force American Volunteer Brigade\",\nand it launched operations in Myanmar in the early 1940s","isHardBreak":false}
{"segment":"Volunteer ","index":81,"input":"The Flying Tigers is officially called the \"Republic of China Air Force American Volunteer Brigade\",\nand it launched operations in Myanmar in the early 1940s","isHardBreak":false}
{"segment":"Brigade\",\n","index":91,"input":"The Flying Tigers is officially called the \"Republic of China Air Force American Volunteer Brigade\",\nand it launched operations in Myanmar in the early 1940s","isHardBreak":true}
{"segment":"and ","index":101,"input":"The Flying Tigers is officially called the \"Republic of China Air Force American Volunteer Brigade\",\nand it launched operations in Myanmar in the early 1940s","isHardBreak":false}
{"segment":"it ","index":105,"input":"The Flying Tigers is officially called the \"Republic of China Air Force American Volunteer Brigade\",\nand it launched operations in Myanmar in the early 1940s","isHardBreak":false}
{"segment":"launched ","index":108,"input":"The Flying Tigers is officially called the \"Republic of China Air Force American Volunteer Brigade\",\nand it launched operations in Myanmar in the early 1940s","isHardBreak":false}
{"segment":"operations ","index":117,"input":"The Flying Tigers is officially called the \"Republic of China Air Force American Volunteer Brigade\",\nand it launched operations in Myanmar in the early 1940s","isHardBreak":false}
{"segment":"in ","index":128,"input":"The Flying Tigers is officially called the \"Republic of China Air Force American Volunteer Brigade\",\nand it launched operations in Myanmar in the early 1940s","isHardBreak":false}
{"segment":"Myanmar ","index":131,"input":"The Flying Tigers is officially called the \"Republic of China Air Force American Volunteer Brigade\",\nand it launched operations in Myanmar in the early 1940s","isHardBreak":false}
{"segment":"in ","index":139,"input":"The Flying Tigers is officially called the \"Republic of China Air Force American Volunteer Brigade\",\nand it launched operations in Myanmar in the early 1940s","isHardBreak":false}
{"segment":"the ","index":142,"input":"The Flying Tigers is officially called the \"Republic of China Air Force American Volunteer Brigade\",\nand it launched operations in Myanmar in the early 1940s","isHardBreak":false}
{"segment":"early ","index":146,"input":"The Flying Tigers is officially called the \"Republic of China Air Force American Volunteer Brigade\",\nand it launched operations in Myanmar in the early 1940s","isHardBreak":false}
{"segment":"1940s","index":152,"input":"The Flying Tigers is officially called the \"Republic of China Air Force American Volunteer Brigade\",\nand it launched operations in Myanmar in the early 1940s","isHardBreak":false}

The above result are based on the real output of v8 prototype CL

TODO before Stage 1

Entrance Criteria to Stage 1

Acceptance Signifies for Stage 1

The committee expects to devote time to examining the problem space, solutions and cross-cutting concerns

Purpose During Stage 1

TODO before Stage 2

Entrance Criteria to Stage 2

...

Acceptance Signifies for Stage 2

...

Purpose During Stage 2

...

Analysis of TG2 Requirements

Prior Art (for Stage 2)

TBW

Expensive to Implement in Userland (for Stage 2)

TBW

Broad Appeal (for Stage 2)

TBW

TODO before Stage 3

Entrance Criteria to Stage 3

...

Acceptance Signifies for Stage 3

...

Purpose During Stage 3

...

Analysis of TG2 Requirements

Payload Mitigation (for Stage 3)

TBW

TODO of the repo

  1. Avoid merge conflicts with build process output files by running:
      git config --local --add merge.output.driver true
      git config --local --add merge.output.driver true
  2. Add a post-rewrite git hook to auto-rebuild the output on every commit:
      cp hooks/post-rewrite .git/hooks/post-rewrite
      chmod +x .git/hooks/post-rewrite
  3. changes to spec.emu (ecmarkup uses HTML syntax, but is not HTML, so I strongly suggest not naming it ".html")
  4. Any commit that makes meaningful changes to the spec, should run npm run build and commit the resulting output.
  5. Whenever you update ecmarkup, run npm run build and commit any changes that come from that dependency.