microsoft / TypeScript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
https://www.typescriptlang.org
Apache License 2.0
100.31k stars 12.39k forks source link

When the text contains Chinese, the semantic mark returned by TsServer is deviated #46062

Open pengsongkun741 opened 2 years ago

pengsongkun741 commented 2 years ago

Bug Report

I want to implement the code coloring function through TSServer, but I found that when the text contains Chinese, the data obtained is biased. If Chinese is not included, then the function is normal. In the case that the code contains Chinese, each Chinese character will cause the offset of the token to be 1 more than the correct value.

πŸ”Ž Search Terms

TsServer Semantic Token Classifications Parsing Chinese error

πŸ•— Version & Regression Information

TypeScript Version:"4.4.3"

⏯ Playground Link

This is a problem with the grammar parser and cannot provide code examples

πŸ’» Code

This is the Json data communicated with TsServer

Containing Chinese

documentChange:

{
    "arguments": {
        "insertString": "/*δΈ­ζ–‡*/\nclass TestClass{\n\n}",
        "endLine": 4,
        "endOffset": 2,
        "line": 1,
        "offset": 1,
        "file": "C:\\Users\\appeon\\source\\repos\\angularproject22\\angularproject22\\src\\app\\app.component.ts"
    },
    "command": "change",
    "seq": 1511,
    "type": "request"
}

Request-encodedSemanticClassifications-full:

{
    "command": "encodedSemanticClassifications-full",
    "arguments": {
        "start": 0,
        "length": 26,
        "format": "2020",
        "file": "C:\\Users\\appeon\\source\\repos\\angularproject22\\angularproject22\\src\\app\\app.component.ts"
    },
    "seq": 1523,
    "type": "request"
}

Response-encodedSemanticClassifications-full:

{
    "request_seq": 1524,
    "Success": true,
    "Command": "encodedSemanticClassifications-full",
    "Message": null,
    "Metadata": null,
    "PerformanceData": null,
    "Body": {
        "spans": [
            15,
            9,
            257
        ],
        "endOfLineState": 0
    },
    "Seq": 0,
    "Type": "response"
}

Does not contain Chinese

documentChange:

{
    "arguments": {
        "insertString": "/*CN*/\nclass TestClass{\n\n}",
        "endLine": 4,
        "endOffset": 2,
        "line": 1,
        "offset": 1,
        "file": "C:\\Users\\appeon\\source\\repos\\angularproject22\\angularproject22\\src\\app\\app.component.ts"
    },
    "command": "change",
    "seq": 1526,
    "type": "request"
}

Request-encodedSemanticClassifications-full:

{
    "command": "encodedSemanticClassifications-full",
    "arguments": {
        "start": 0,
        "length": 26,
        "format": "2020",
        "file": "C:\\Users\\appeon\\source\\repos\\angularproject22\\angularproject22\\src\\app\\app.component.ts"
    },
    "seq": 1534,
    "type": "request"
}

Response-encodedSemanticClassifications-full:

{
    "request_seq": 1558,
    "Success": true,
    "Command": "encodedSemanticClassifications-full",
    "Message": null,
    "Metadata": null,
    "PerformanceData": null,
    "Body": {
        "spans": [
            13,
            9,
            257
        ],
        "endOfLineState": 0
    },
    "Seq": 0,
    "Type": "response"
}

As can be seen from the above data, the length of the text sent is consistent with the content of the semantic markup request. However, in the case of including Chinese, the data obtained with semantic markup is biased

πŸ™ Actual behavior

hasCN

πŸ™‚ Expected behavior

nothasCN

pengsongkun741 commented 2 years ago

Excuse me, VSCode also uses TSServer, but why is there no such problem? Seeing that this problem has been postponed to the 4.6.0 version of the plan. Can you tell us some information about this problem and see if we can temporarily solve this problem with other solutions, thank you!