holman / ama

Ask @holman anything!
732 stars 277 forks source link

Raw View - Unsupported Characters? #820

Closed coder4589 closed 7 years ago

coder4589 commented 7 years ago

Hi guys,

I just noticed that in "Raw View" some characters are not rendered correctly?

https://github.com/coder4589/jQuery/blob/master/%23_%20First%20notes.txt https://raw.githubusercontent.com/coder4589/jQuery/master/%23_%20First%20notes.txt

in this example, the unicode character 8217 " ’ " is rendered as " � " in Raw view!

Is it a bug? Can it be fixed?

holman commented 7 years ago

You should probably ask GitHub. :)

coder4589 commented 7 years ago

We've heard back from the team and this has to do with the encoding setting on your file. The file is in a non-Unicode encoding, namely something compatible with ISO Latin 1/ISO-8859-1. (Windows will often produce documents in this encoding.)

In GitHub blobs, we try to guess what the correct encoding is and display the file that way. However, for the raw file, we don't modify the encoding at all. By default, your browser is interpreting the file as UTF-8, which is causing the issue you see here.

If you change the encoding in your browser, you can view the file correctly.

The best solution here would be to make your file UTF-8 encoded.

We've also started a discussion internally about our raw endpoint automatically doing the same kind of encoding detection that our regular blob view does so taht we can serve the correct MIME type in Content-Type so this works on the raw URL as well.

We don't have any specifics on when or if that change will happen, so modifying the encoding of your file is the best path forward for now.

Shawna