rvaughn / cvs-fast-export

CVSNT-to-Git conversion utility
11 stars 4 forks source link

Encoding of German umlauts in Commit messages #3

Closed robotroll closed 9 years ago

robotroll commented 9 years ago

Hi,

I have a problem with the encoding of the commit messages the converter uses. All the German umlauts used in commit messages don't get converted correctly. For example the commit message: "Übersetzung" is converted to "Ãœbersetzung". When I change the encoding to utf-8 everything works as expected. Is there any reasoning behind the used encoding of windows-1252?

Regards, Robin

rvaughn commented 9 years ago

Yes, it's the default encoding on Windows (under en-US, at least) so it's what Windows coders are likely to use... and more specifically it's what I encountered in the repos I had to convert. I really need to add an option to control this, since comments may be in different encodings and CVSNT apparently does not attempt to normalize them.

robotroll commented 9 years ago

The question is if an default of utf-8 suits all local encodings and convert them correctly. I will open a pull request with my change. Please go ahead and test it with your encoding.

rvaughn commented 9 years ago

It won't work correctly with win-1252 - it doesn't detect, it needs to know exactly what encoding the raw bytes are in so that it can convert to UTF-16 internally. I'll accept your PR as a new default though, and work on making it more flexible.