testcontainers / testcontainers-java

Testcontainers is a Java library that supports JUnit tests, providing lightweight, throwaway instances of common databases, Selenium web browsers, or anything else that can run in a Docker container.
https://testcontainers.org
MIT License
7.99k stars 1.64k forks source link

[Bug]: init script UTF8 conversion breaks encoding used in database when not in UTF8 #8776

Open dcdh opened 3 months ago

dcdh commented 3 months ago

Module

Core

Testcontainers version

1.19.8

Using the latest Testcontainers version?

Yes

Host OS

Windows

Host Arch

x86

Docker version

Client:
 Cloud integration: v1.0.35+desktop.13
 Version:           26.0.0
 API version:       1.45
 Go version:        go1.21.8
 Git commit:        2ae903e
 Built:             Wed Mar 20 15:18:56 2024
 OS/Arch:           windows/amd64
 Context:           default

Server: Docker Desktop 4.29.0 (145265)
 Engine:
  Version:          26.0.0
  API version:      1.45 (minimum version 1.24)
  Go version:       go1.21.8
  Git commit:       8b79278
  Built:            Wed Mar 20 15:18:01 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.28
  GitCommit:        ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

What happened?

I am using the init script feature to create my tables and insert data after my container has been started and before my application is started. It is a very convenient way to init my Oracle database for testing purpose.

My database is very old and use the CP1252 encoding to store data.

To respect this encoding requirement my init script is encoded using the windows-1252 .

And in fact it fails to start because my accents encoding is not respected and take more spaces than allowed by the column definition. In CP1252 the accent character is stored in one byte. And when I run the script I suspect it to occupy more than one byte.

I guess it occurs in this part:

    public static void runInitScript(DatabaseDelegate databaseDelegate, String initScriptPath) {
        try {
            URL resource = Thread.currentThread().getContextClassLoader().getResource(initScriptPath);
            if (resource == null) {
                resource = ScriptUtils.class.getClassLoader().getResource(initScriptPath);
                if (resource == null) {
                    LOGGER.warn("Could not load classpath init script: {}", initScriptPath);
                    throw new ScriptLoadException(
                        "Could not load classpath init script: " + initScriptPath + ". Resource not found."
                    );
                }
            }
            String scripts = IOUtils.toString(resource, StandardCharsets.UTF_8);
            executeDatabaseScript(databaseDelegate, initScriptPath, scripts);
        } catch (IOException e) {
            LOGGER.warn("Could not load classpath init script: {}", initScriptPath);
            throw new ScriptLoadException("Could not load classpath init script: " + initScriptPath, e);
        } catch (ScriptException e) {
            LOGGER.error("Error while executing init script: {}", initScriptPath, e);
            throw new UncategorizedScriptException("Error while executing init script: " + initScriptPath, e);
        }
    }

and specifically here

String scripts = IOUtils.toString(resource, StandardCharsets.UTF_8);

The scripts encoding has been converted using the UTF_8 charset. And now my accent will be stored using 2 bytes when running my insert sql command.

Is it possible to change the code by avoiding enforcing the Charset ?

Regards,

Damien

Relevant log output

No response

Additional Information

No response