gradle / gradle

Adaptable, fast automation for all
https://gradle.org
Apache License 2.0
16.68k stars 4.67k forks source link

Tests unrunnable when in directory with characters not encodable by system encoding #30391

Open Vampire opened 2 weeks ago

Vampire commented 2 weeks ago

Current Behavior

If you have a project in a directory with cyrillic characters like Рабочий стол/Проекты on a Windows system with windows-1252 system encoding, then trying to run a test ends with a ClassNotFoundException as described in https://discuss.gradle.org/t/classnotfoundexception-for-custom-main-test-class-while-gradlew-build/49286.

The reason is, that the characters are not encodable in windows-1252, but the org.gradle.process.internal.worker.child.ApplicationClassesInSystemClassLoaderWorkerImplementationFactory#writeOptionsFile method wants to write the classpath to an argsfile which do only support ASCII or the system encoding if it is ASCII-friendly.

But it does not even fail at that point, but just puts question marks to the args file which is then not found and thus ignored on the classpath and then when Jupiter tries to load the test class, cannot find it as it is not present on the classpath.

Expected Behavior

At least throw some meaningful error early, or better detect that characters that cannot be encoded are present and then either do not use an args file at all, or only use it for encodable args and not for others.

Gradle version

8.8

Build scan URL (optional)

https://scans.gradle.com/s/fawi3qzrehpeq

ov7a commented 2 weeks ago

This issue will be closed as a duplicate of

Please add your use case and 👍 to that issue.

If you think our analysis is wrong, please provide us with more detailed information explaining why.


We have an issue with args file encoding, see the linked issue.

IIUC, if that issue is fixed, then this issue would also be resolved (given that the environment is correct) since Рабочий стол is a system location encoded with a default system encoding (Windows-1251).

In the original forum thread, there was also a potential issue with environment:

The other thing is, that you are using a path that is potentially problematic in three aspects. It has a space in it, it has cyrillic characters in it, and it is a OneDrive synced folder. All these individually I know can break some things, independent of whether it is Gradle or Jupiter related or not. So I suggest you eradicate all three of these aspects and try again

So, if the system is misconfigured and uses a mix of different encodings, there is no easy way for Gradle to detect/fix that.

Vampire commented 2 weeks ago

Ah, didn't find that issue. It is very related yes, though I'm not sure it really is a duplicate and will fix this too.

Here the system native encoding was used for the args file and nothing crashed. As the characters are not encodable in windows-1252 (not 1251) here the args file just had question marks and then was ignored as non-existing classpath entries as usual.

So the behavior here was already like if that other issue was fixed and the problem will persist.

If with potential environment issue you mean my OneDrive suspicion, that's not related. I was able to locally here create a folder with those cyrillic characters, put the MCVE in it, and the problem reproduces.

Vampire commented 2 weeks ago

So no matter which encoding is used, the algorithm must make sure that only characters encodable in the used encoding are written to the file and otherwise at least throw a meaningful error, not use args file at all, or only put the encodable options to the args file.

So I indeed think this is a separate issue.

ov7a commented 2 weeks ago

Ok, reopening then. It would be great if you could provide a small self-contained reproducer to test this once the other issue is fixed.

Vampire commented 2 weeks ago

It can never be self-contained, as it depends on the native system encoding in which argsfiles have to be encoded. But the reproducer is simply the one from the linked forum discussion, or any project with any test, when cloned to a directory with characters that are not encodable in the system native encoding where you test. So as said, for example in the directory Рабочий стол/Проекты on any system with windows-1252 as native system encoding.

shartte commented 1 week ago

Java is in a very sad state w.r.t. encoding on Windows. There is native functionality to fully support unicode command line arguments, Java choses not to use them :-(