jmporterog / maven-replacer-plugin

Automatically exported from code.google.com/p/maven-replacer-plugin
MIT License
0 stars 0 forks source link

Character encoding problem #33

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,

When using maven-replacer-plugin to replace some configuration files that 
contain accent 
mark like : é è à ..., it works well but when I open my files that was 
packaged in the war, 
I found that all character with accent mark had been replaced by a question 
mark "?" 

Of course, only files that were rewritten by maven-replacer-
plugin are concerned.

It seems like the plugin is writing file with its charset.

I've specified the charset to UTF-8 but with no luck :
<encoding>UTF-8</encoding>

I'm using the version : 1.3.1 of the plugin on debian lenny.

Thanks

PS : Very nice to be the first one other than baker.steven.83 to report an 
issue. Keep the 
good work ;)

Original issue reported on code.google.com by reda.abdi on 16 May 2010 at 9:55

GoogleCodeExporter commented 9 years ago
Hi reda.abdi,

Thanks for reporting this issue. I will have this fixed in the next release.

Sincerely,
baker.steven.83

Original comment by baker.st...@gmail.com on 16 May 2010 at 9:43

GoogleCodeExporter commented 9 years ago
Hi reda.abdi,

I cannot replicate this issue. Even when using war:war on the project used to 
test 
this plugin.
I am using Windows XP and cygwin to test.

If you wish to see how I am testing it, check out: https://maven-replacer-
plugin.googlecode.com/svn/test-plugin-use
and run verifyReplacement.sh and check out the output files.
You can even run: mvn clean test war:war, which will create a war file with the 
contents having been replaced by mvn-replacer-plugin. Those contents appear to 
be as 
expected on my local environment.

You should also check your JVM args and local environment/shell args in case 
there is 
something there which is messing with the encoding.

I hope this helps and thanks for your patience,
Steven

Original comment by baker.st...@gmail.com on 19 May 2010 at 12:01

GoogleCodeExporter commented 9 years ago
Hi Steven,

You should make your tests in the same environment as me because windows + 
cygwin is
different from Debian.

My system is configured with UTF-8 "everywhere" shell , locale, ...

I think maven-replacer-plugin is writing using the default windows charset 
which can
be windows-1252 and this may be the cause of the problem.

Does maven-replacer-plugin support charset specified by user using tag  
<encoding> ? 

Thanks

Original comment by reda.abdi on 21 May 2010 at 2:45

GoogleCodeExporter commented 9 years ago
Hi Reda.abdi,

There is currently no support to specify a character encoding within this 
plugin. 

This plugin does not specify a character set specifically at all, so it should 
be 
defaulting to whatever is specified within maven or the JVM.

I will investigate an ability to set the charset at runtime within this plugin.

Thanks for your feedback,
Steven

Original comment by baker.st...@gmail.com on 22 May 2010 at 9:57

GoogleCodeExporter commented 9 years ago
Hi Reda.abdi,

Can you try specifying the "-Dfile.encoding=UTF8" argument when you run mvn?

Thanks,
Steven

Original comment by baker.st...@gmail.com on 23 May 2010 at 3:35

GoogleCodeExporter commented 9 years ago
I have installed Ubuntu 10 on a local virtual machine to test with.
The file encodings appear to be being preserved already.

Here is the source files compared to target files from the test suite:
steven@steven-vb:~/development/workspace/test-plugin-use$ file 
src/main/resources/*
src/main/resources/excludefile1:        ASCII text
src/main/resources/excludefile2:        ASCII text
src/main/resources/file1:               ASCII text
src/main/resources/file2:               ASCII text
src/main/resources/include1:            ASCII text
src/main/resources/include2:            ASCII text
src/main/resources/largefile.txt:       ASCII text, with very long lines, with 
CRLF
line terminators
src/main/resources/multiline.txt:       ASCII text, with CRLF line terminators
src/main/resources/non-ascii-char-file: UTF-8 Unicode (with BOM) text
src/main/resources/regex.txt:           ASCII text, with CRLF line terminators
src/main/resources/simple.txt:          ASCII text, with CRLF line terminators

steven@steven-vb:~/development/workspace/test-plugin-use$ file target/classes/*
target/classes/excludefile1:                    ASCII text
target/classes/excludefile2:                    ASCII text
target/classes/file1:                           ASCII text
target/classes/file2:                           ASCII text
target/classes/include1:                        ASCII text
target/classes/include2:                        ASCII text
target/classes/largefile.txt:                   ASCII text, with very long 
lines,
with CRLF line terminators
target/classes/multiline.txt:                   ASCII text, with CRLF line 
terminators
target/classes/non-ascii-char-file:             UTF-8 Unicode (with BOM) text
target/classes/regex-flags-outputfile.txt:      ASCII text, with CRLF line 
terminators
target/classes/regex.txt:                       ASCII text, with CRLF line 
terminators
target/classes/simple-outputfile-tokenfile.txt: ASCII text, with CRLF line 
terminators
target/classes/simple-outputfile.txt:           ASCII text, with CRLF line 
terminators
target/classes/simple-outputfile-valuefile.txt: ASCII text, with CRLF line 
terminators
target/classes/simple.txt:                      ASCII text, with CRLF line 
terminators

When I view the contents with 'cat' the ? chars appear in both the source and 
target
files. However, when viewing with gedit the text appears to be correct (no ? 
chars).

Can you verify that your source files are in UTF-8 format?

Thanks,
Steven

Original comment by baker.st...@gmail.com on 23 May 2010 at 4:42

GoogleCodeExporter commented 9 years ago
Closing Issue. Appears to be no problems and unable to replicate.
If the Issue is ongoing, please raise again.

Original comment by baker.st...@gmail.com on 1 Jun 2010 at 4:52

GoogleCodeExporter commented 9 years ago
Hi and thanks for this effort,

The problem is coming from my source files that were encoded in iso8859-1 and 
changing that to 
UTF-8 solved my problem.

I'm really offended about this :\ because it should be the first thing to 
inspect in such 
situation.

Best regards

Original comment by reda.abdi on 1 Jun 2010 at 6:25