Open stevenchalem opened 1 year ago
Were the Unicode characters in the source encoded? e.g Markdown expects U+201C
to become “
(decimal instead of hex value).
@sa-bpelakh see this PR for an example with an apostrophe that didn't convert correctly: https://github.com/semanticarts/gist/pull/920/files
It looks like it is a 3 byte character: e2 80 99
Here is the PR to fix the characters that this issue was originally reported on: https://github.com/semanticarts/gist/pull/868/files
This seemed to use 3 byte quote characters of e2 80 9c and e2 80 9d You can get a copy of the old version of the file and see the output with: ''' git checkout bcba46dbda88c6a0eb02e2441725bcf63cecd3e9 head -55 docs/Namespace.md | tail -10 | od -xacb --endian=big ''' Look for the 'nul' in the output.
Here is the PR to fix the characters that this issue was originally reported on: https://github.com/semanticarts/gist/pull/868/files
This seemed to use 3 byte quote characters of e2 80 9c and e2 80 9d You can get a copy of the old version of the file and see the output with: ''' git checkout bcba46dbda88c6a0eb02e2441725bcf63cecd3e9 head -55 docs/Namespace.md | tail -10 | od -xacb --endian=big ''' Look for the 'nul' in the output.
Oh, I understand what happened. The way I see it, our Markdown inputs should be utf-8 compliant. However
&#...
encoding I described above.
.md files containing certain Unicode characters (e.g. left double quote and right double quote, U+201C and U+201D) cause the bundling process to fail. For example when the new Namespace.md file was added to gist it contained such characters and the bundling process failed.