Possibility to modify NLS_LANG

PreHCM-diba commented 10 months ago

Would be nice to have a possibility to set the NLS_LANG parameter, to also have the possibility to import data that was exported from another database, correctly. Curently it's fixes to AL32UTF8, butwould be nice to make it configurable to set it also to e.g. WE8ISO8859P1

akurone commented 9 months ago

hi @gvenzl, first of all thanks for the great work!

i am also struggling with this: i want to utilize gvenzl/oracle-free:23.3-slim-faststart in testcontainers with a baseline schema/data from a WE8ISO8859P9 DB which has varchar2(50) columns storing chars like Ş, Ü, İ with length of 48 (with help of these 2-byte chars, total byte length for the column is around 51/52 on some rows in AL32UTF8).

i tried to alter length semantics but could not find a way to change it on DB level (i managed to set it on instance or session but my insert statements are still failing with "...too small..." issue).

i understand the value of AL32UTF8 but i can't change the prod DB ¯\(ツ)/¯, being able to adjust the test setup in-line with prod would be great.

and to be clear: i am not expert on oracle, i assumed my problem is related with this but i could probably got it all wrong. so, would love to hear your feedback on it.

gvenzl commented 9 months ago

Hi @PreHCM-diba, hi @akurone,

Thanks to the both of you for using these images!

@PreHCM-diba,

The NLS_LANG environment variable is only defaulted to .AL32UTF8. This can always be overwritten via the -e option when running the container:

[gvenzl@localhost ~]$ podman run -d --name test \
> -e ORACLE_PASSWORD=LetsTest1 \
> -e NLS_LANG=.WE8ISO8859P1 \
> gvenzl/oracle-free:slim-faststart
[gvenzl@localhost ~]$ podman exec -ti test bash
bash-4.4$ echo $NLS_LANG
.WE8ISO8859P1

@akurone,

The database character set, unfortunately, is a different beast. It can only be defined during database creation and not easily changed afterward. Hence, the only logical path was to pick the since 11gR2 (2007) recommended default of AL32UTF8 that will be able to store any character in today's existence.

The only viable path here would be to build your own, customized docker image, or provide the image with a volume that contains such a customized Oracle Free database.

Fortunately, I think, there is another possibility for you here that hopefully does the trick. Oracle Database allows to specify the character length semantics to be either in bytes (default) or characters. The default can be controlleded via the NLS_LENGTH_SEMANTICS parameter. If you were to set this parameter to the value CHAR, the table containing the varchar2(50) column will have enough space to store 50 UTF-8 characters, instead of 50 bytes. Logically, nothing should change for your app, functions like LENGTH(), etc. will still report a length of one character, not the byte, for example:

The insert below fails because the Ş character requires 2 bytes rather than one:

SQL> create table foo (t varchar2(1));

Table created.

SQL> insert into foo values ('Ş');
insert into foo values ('Ş')
                        *
ERROR at line 1:
ORA-12899: value too large for column "SYSTEM"."FOO"."T" (actual: 2, maximum:
1)
Help: https://docs.oracle.com/error-help/db/ora-12899/

But when creating the table with a length of 1 CHAR, the character now fits:

SQL> create table foo1 (t varchar2(1 char));

Table created.

SQL> insert into foo1 values ('Ş');

1 row created.

When you do a describe on the table, you will see the different length semantics in the output:

SQL> desc foo
 Name                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 T                          VARCHAR2(1)

SQL> desc foo1
 Name                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 T                          VARCHAR2(1 CHAR)

And when you do a select length() over that one character, Oracle will still return 1:

SQL> select length(t) from foo1;

 LENGTH(T)
----------
     1

Instead of changing your DDL scripts to type CHAR everywhere, you can just set NLS_LENGTH_SEMANTICS=CHAR:

SQL> alter session set nls_length_semantics=char;

Session altered.

SQL> create table foo2 (t varchar2(1));

Table created.

SQL> desc foo2
 Name                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 T                          VARCHAR2(1)

SQL> insert into foo2 values ('Ş');

1 row created.

SQL> select length(t) from foo2;

 LENGTH(T)
----------
     1

SQL> desc foo;
 Name                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 T                          VARCHAR2(1 BYTE)

I hope this helps!

akurone commented 9 months ago

thanks @gvenzl, both for sharing these images and for the detailed response! i am currently using the NLS_LENGTH_SEMANTICS trick; meanwhile we are exploring official dockerfiles (also made by you i guess:)) with my dba colleagues to build a custom image.

gvenzl commented 9 months ago

Hey @akurone,

You are very welcome!

Ha, yes, once upon a time but now they are in the hands of my capable colleagues. :) The official EE/SE2 images do have an environment variable for the character set -e ORACLE_CHARACTERSET= as they are creating the database at first startup.

You should be able to have a custom Dockerfile that creates a database with the character set you are looking for and then commits a new image for you.

gvenzl / oci-oracle-free

Possibility to modify NLS_LANG #39