microsoft / TypeScript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
https://www.typescriptlang.org
Apache License 2.0
99.69k stars 12.35k forks source link

Don't escape valid Unicode characters in strings #36174

Open sonacy opened 4 years ago

sonacy commented 4 years ago

TypeScript Version: 3.7.4

Code

const sf = createSourceFile(
  'aaa',
  'const a: string = "ε“ˆε“ˆ"',
  ScriptTarget.Latest
)
// try to do sth in transfrom.
const result = transform(sf, [])
const printer = createPrinter()
const printed = printer.printNode(
  EmitHint.SourceFile,
  result.transformed[0],
  sf
)
console.log(printed)

Expected behavior: const a: string = "ε“ˆε“ˆ"

Actual behavior: const a: string = "\u54C8\u54C8";

I am trying to use compiler api to do some transform. but the Printer seems could not generate the decoded unicode characters. wonder how to do this right?

sonacy commented 4 years ago

i am seeing the api here.

const realPath = path.resolve(__dirname, './utf8.ts')
const program = createProgram([realPath], {
  target: ScriptTarget.ES2017,
  module: ModuleKind.ES2015,
  allowJs: true,
  jsx: JsxEmit.Preserve,
})
// use it, got expected answer
// program.getTypeChecker()
const result = transform(sf, [])
const printer = createPrinter()
const printed = printer.printNode(
  EmitHint.SourceFile,
  result.transformed[0],
  sf
)
console.log(printed)

same here, use the program api, the file content is basic: 'const a: string = "ε“ˆε“ˆ"'. but got result: const a: string = "\u54C8\u54C8"; but when i use: program.getTypeChecker(), i got expected answer like: const a: string = "ε“ˆε“ˆ". wonder why this happens?

DanielRosenwasser commented 4 years ago

It's not that you're doing anything wrong - our implementation just escapes any characters outside of the printable range of ASCII characters. Nowadays e might be equipped to do a little better given that we have the set of valid unicode identifier characters.

Is there a reason this emit is a problem for you?

GilbertSun commented 4 years ago

characters

we use the transform api to deal our source code, for example

const a:string = 'ε“ˆε“ˆ' => const a: string = i18n('ε“ˆε“ˆ'), so we can search our codebase to replace all the chinese string to use i18n, but if typescript escapes any characters outside of the printable range of ASCII characters, our code base will be wired

is there any solutions let me keep my chinese string, thanks

RyanCavanaugh commented 4 years ago

I don't think we should escape these unless there's some hard necessity.

DanielRosenwasser commented 4 years ago

No, it was strictly ease of implementation at the time. I'm marking this as Difficult because any contribution needs very thorough test code.

git9am commented 4 years ago

Hitting same issue. Our workaround:

    let content = printer.printFile(file);
    content = unescape(content.replace(/\\u/g, "%u"));
Grawl commented 5 months ago

backlog since 2020

image

RyanCavanaugh commented 5 months ago

Backlog = PRs accepted, be the change you want to see in the world πŸ˜‡

KevinWang15 commented 2 months ago

I'm now using recast to workaround this issue

import ts from "typescript";
import { parse, print, types } from "recast";

const output = ts.transpileModule("`δ½ ε₯½`", {});
console.log("typescript output:\n", output.outputText);

let ast = parse(output.outputText);

types.visit(ast, {
  visitLiteral(path) {
    const node = path.node;

    if (typeof node.value === "string") {
      path.replace(types.builders.stringLiteral(node.value));
    }

    this.traverse(path);
  },
});

console.log("recast output:\n", print(ast).code);

outputs

typescript output:
 "\u4F60\u597D";

recast output:
 "δ½ ε₯½";