upleveled / system-setup

Set up a PERN development environment on Windows, macOS and Linux
28 stars 19 forks source link

psql Client encoding doesn't allow Umlaute / other UTF8 special characters #76

Open karlhorky opened 2 months ago

karlhorky commented 2 months ago

Reporting on behalf of @Petzlc, text from @Petzlc:

What happened that was unexpected

Hyper somehow cannot process äöü so it always throws this Error message:

FEHLER:  Zeichen mit Byte-Folge 0x81 in Kodierung »WIN1252« hat keine Entsprechung in Kodierung »UTF8«

What were your expectations

That i can create my table with the values that contain äöü in the spelling.

Your guess of what the problem is

So on my journey of researching i ran into this message that also appears when i start psql and which seems to be a problem for other people too and seems to be the root of this problem:

WARNING: Console code page (850) differs from Windows code page (1252)
         8-bit characters might not work correctly. See psql reference
         page "Notes for Windows users" for details.
Type "help" for help.

(mine is in german though)

with the following advice:

You should know what encoding is used in your database.

SHOW server_encoding;

When you connect to your database you can specify what encoding should your client use:

SET client_encoding TO 'UTF8';

If server and client encoding differ, the database driver tries to translate between those two encoding. When it can not find an equivalent character, the error is thrown.

A list of things you have tried to solve it

So this seems to me like a pretty forward advice of what to do when someone runs into this error. I didn't try to apply it though and just changed the äöü to ae and oe and ue. Also i'm wondering if this affects é è and so on. I just changed everything to uncomplicated and common letters.

ProchaLu commented 1 month ago

After some research, it seems that there are multiple options that we need to test on @Petzlc machine.

Solution 1

We need to find out what encoding the machine uses for the database by entering SHOW server_encoding;. We can follow this Stack Overflow post to change the encoding to UTF-8.

Solution 2

You can also manually change the locale system language to use UTF-8. By following this guide, we can ensure that Windows now uses UTF-8 for language support.

karlhorky commented 1 month ago

Ok nice, go ahead with more investigation - prefer choices that:

  1. Does not change the language in the whole OS to English - only UTF-8 support for psql
    • 73 is for changing the psql language to English

  2. Works cross-platform (Windows, macOS, Linux)
ProchaLu commented 1 month ago

This is the output of @Petzlc when checking the encoding with SHOW client_encoding;.

 client_encoding
-----------------
 WIN1252
(1 Zeile)

and when checking the locale:

$ locale
LANG=de_DE.UTF-8
LC_CTYPE="de_DE.UTF-8"   
LC_NUMERIC="de_DE.UTF-8" 
LC_TIME="de_DE.UTF-8"    
LC_COLLATE="de_DE.UTF-8" 
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_ALL=

Maybe the missing LC_ALL causes this issue?

Setting the locale to en-US.UTF-8 with setx LC_ALL en_US.UTF-8, this will also change the system's default and will affect the full system and not only PostgreSQL.


To change the encoding to UTF-8 we can use SET client_encoding TO 'UTF8'; but this only sets the encoding to UTF-8 for this one Database.

To ensure that we start creating a DB with UTF-8 we can use this PostgreSQL command.

CREATE DATABASE <name> ENCODING 'UTF8'

but this command will not set the default encoding to UTF-8. To set the default encoding for PostgreSQL to UTF-8, the only solution I found so far is setting the encoding UTF-8 in the .psqlrc file for Linux/macOS, and psqlrc.conf for Windows.

Maybe I can find a way to use the Control Functions from PostgreSQL.

karlhorky commented 1 month ago

Setting the locale to en-US.UTF-8 with setx LC_ALL en_US.UTF-8, this will also change the system's default and will affect the full system and not only PostgreSQL.

what exactly is affected here, from the "full system"? is it only the command line? if so, that may be acceptable

Maybe the missing LC_ALL causes this issue?

keep in mind that we already set LC_ALL on macOS:

if it makes sense to also do this for the other operating systems (first figure out what is affected by it in the "full system") then we can go ahead with such a PR

(and make sure that we do it all the same way in all of Windows, macOS, Linux)

cc @Eprince-hub in case this also affects your #73 issue

Eprince-hub commented 1 month ago

What LC_ALL changes

I don't think that the LC_ALL would modify the entire system or anything other than the command line. I even think psql only (not sure about this). My system language was English but my psql was German until I set.

Read some info about Locale here. https://docs.oracle.com/cd/E19253-01/817-2521/overview-39/index.html

karlhorky commented 1 month ago

I don't think that the LC_ALL would modify the entire system or anything other than the command line. I even think psql only (not sure about this). My system language was English but my psql was German until I set.

yeah check into this a bit more, we should have A) a list of things that we are relatively certain that LC_ALL affects and B) a list of what it does not affect