i18next / i18next-parser

Parse your code to extract translation keys/values and manage your catalog files
MIT License
466 stars 189 forks source link

Use utf-8 as default encoding to read files #992

Open dobesv opened 3 months ago

dobesv commented 3 months ago

The encoding set on the transform is technically supposed to be used for the chunks going through the transform itself, and it's not necessarily set for a Transform, especially in object mode. This falls back on utf-8 as the encoding when loading a file from disk.

I suspect this should just always use utf-8 always but I can't say for sure there isn't a use case for using the encoding parameter.

dobesv commented 3 months ago

Fixes https://github.com/i18next/i18next-parser/issues/993

karellm commented 3 months ago

Based on the docs, passing the encoding option changes the output from buffer to string. Can you better describe what has changed in node 20.12 ?

dobesv commented 3 months ago

To be honest I don't know what exactly they changed in node v20.12. I couldn't figure it out from their release notes.

However, by switching versions and running in the debugger I found that in the earlier version, the encoding parameter to this function would be set to utf-8 but in node v20.12 and later the encoding was undefined.

I suppose that seems right, in a way, since the stream is in object mode there's no reason for it to have an encoding.

If you look a few lines up from my change you can see that content is expected to be a string here, e.g.

      content = file.contents.toString('utf8')

When it is not a string, I get errors in the typescript parser:

s.codePointAt is not a function
    TypeError: s.codePointAt is not a function
        at codePointAt (xxx/.yarn/cache/typescript-patch-32ada147aa-5659316360.zip/node_modules/typescript/lib/typescript.js:12414:81)
        at Object.scan (xxx/.yarn/cache/typescript-patch-32ada147aa-5659316360.zip/node_modules/typescript/lib/typescript.js:11509:26)

That's because s was passed as a Buffer instead of a string, and Buffer doesn't have a codePointAt method on it. By ensuring that we always provide an encoding, we ensure that s is a string and thus it will implement the interface expected by the TypeScript parser.

dobesv commented 3 months ago

As I mentioned in the description I suspect the change maybe should be:

      content = fs.readFileSync(file.path, 'utf8')

I'm not sure why exactly the incoming encoding is relevant to the source file encoding. But maybe there's some use case for that, like in gulp or something. I don't know.