Closed midgleyc closed 3 months ago
One way to solve this locally without any code changes is overiding the content-types.properties file used by setting the content.types.user.table
system property to point to a custom file
Filed #1233 to make getCharacterEncoding
protected. Thinking more I think Rhino can fix the problem on its end: we only have the issue for local files, and you can use url.getProtocol()
to determine whether the file is local or not (if local, the protocol is file
). The text/
adjustment should likely only be done if the file comes over the HTTP protocol, so
if (contentType != null && contentType.startsWith("text/")) {
return "8859_1";
}
could be replaced with
String protocol = url.getProtocol();
boolean isHttp = protocol != null && (protocol.equals("http") || protocol.equals("https"));
if (isHttp && contentType != null && contentType.startsWith("text/")) {
return "8859_1";
}
Does this seem reasonable?
I haven't found any documentation that local files in CommonJS ought to be stored in utf-8. However, it seems to me that NodeJS expects this.
@botic would you have more insight, given that RingoJS supports CommonJS as well?
So imho I'd think that defaulting to utf-8 for local files might be the easiest.
Note that the defaulting to 8859_1 is basically wrong as well: that used to be the default in http, but has been removed from the spec
And nowadays the BOM ought to be checked (even for utf8 where a BOM is optional)
In Ringo the module loading process uses the charset specified in the Ringo config. Every module load will enforce this charset. Default is utf-8
for all text-based resources.
Anyone have any idea when this will be fixed?
Fix (workaround) released with 1.7.15.
In https://bugs.openjdk.java.net/browse/JDK-8273655, a change was made so that ".js" files now are seen as having content type "text/javascript".
This leads to a change in the UrlModuleSourceProvider where local ".js" files (or any ".js" files without a Content-Type encoding) are interpreted as having a character encoding of "8859_1" while previously they were interpreted as having a character encoding of "utf-8".
This has lead to a problem with KoLMafia's ability to parse javascript scripts, as discussed in this thread.
I've put together a repo at java-probe-content-type which shows the difference for local files.
I don't know what a good fix to this would look like (as assuming files without an explicit encoding are cp-1252 is as the spec says), but it might be a quicker fix if
getCharacterEncoding
wereprotected
instead ofprivate static
. That way we could override it on our end to expectutf-8
instead of8859_1
without affecting other uses of this library.