Bug Report

I want to implement the code coloring function through TSServer, but I found that when the text contains Chinese, the data obtained is biased. If Chinese is not included, then the function is normal. In the case that the code contains Chinese, each Chinese character will cause the offset of the token to be 1 more than the correct value.

🔎 Search Terms

TsServer Semantic Token Classifications Parsing Chinese error

🕗 Version & Regression Information

TypeScript Version:"4.4.3"

This is a crash

⏯ Playground Link

This is a problem with the grammar parser and cannot provide code examples

💻 Code

This is the Json data communicated with TsServer

Containing Chinese

documentChange:

{
    "arguments": {
        "insertString": "/*中文*/\nclass TestClass{\n\n}",
        "endLine": 4,
        "endOffset": 2,
        "line": 1,
        "offset": 1,
        "file": "C:\\Users\\appeon\\source\\repos\\angularproject22\\angularproject22\\src\\app\\app.component.ts"
    },
    "command": "change",
    "seq": 1511,
    "type": "request"
}

Request-encodedSemanticClassifications-full:

{
    "command": "encodedSemanticClassifications-full",
    "arguments": {
        "start": 0,
        "length": 26,
        "format": "2020",
        "file": "C:\\Users\\appeon\\source\\repos\\angularproject22\\angularproject22\\src\\app\\app.component.ts"
    },
    "seq": 1523,
    "type": "request"
}

Response-encodedSemanticClassifications-full:

{
    "request_seq": 1524,
    "Success": true,
    "Command": "encodedSemanticClassifications-full",
    "Message": null,
    "Metadata": null,
    "PerformanceData": null,
    "Body": {
        "spans": [
            15,
            9,
            257
        ],
        "endOfLineState": 0
    },
    "Seq": 0,
    "Type": "response"
}

Does not contain Chinese

documentChange:

{
    "arguments": {
        "insertString": "/*CN*/\nclass TestClass{\n\n}",
        "endLine": 4,
        "endOffset": 2,
        "line": 1,
        "offset": 1,
        "file": "C:\\Users\\appeon\\source\\repos\\angularproject22\\angularproject22\\src\\app\\app.component.ts"
    },
    "command": "change",
    "seq": 1526,
    "type": "request"
}

Request-encodedSemanticClassifications-full:

{
    "command": "encodedSemanticClassifications-full",
    "arguments": {
        "start": 0,
        "length": 26,
        "format": "2020",
        "file": "C:\\Users\\appeon\\source\\repos\\angularproject22\\angularproject22\\src\\app\\app.component.ts"
    },
    "seq": 1534,
    "type": "request"
}

Response-encodedSemanticClassifications-full:

{
    "request_seq": 1558,
    "Success": true,
    "Command": "encodedSemanticClassifications-full",
    "Message": null,
    "Metadata": null,
    "PerformanceData": null,
    "Body": {
        "spans": [
            13,
            9,
            257
        ],
        "endOfLineState": 0
    },
    "Seq": 0,
    "Type": "response"
}

As can be seen from the above data, the length of the text sent is consistent with the content of the semantic markup request. However, in the case of including Chinese, the data obtained with semantic markup is biased

🙁 Actual behavior

hasCN

🙂 Expected behavior

nothasCN

microsoft / TypeScript

When the text contains Chinese, the semantic mark returned by TsServer is deviated #46062