Closed YSmetana closed 11 years ago
I had to do a bit of research on this one to see where the disconnect lies.
According to the MediaWiki manual on the $wgDBTableOptions configuration option MediaWiki always writes its data in UTF-8 encoding by default, DokuWiki does the same for all recent versions too. I created a page in MediaWiki on Google's philosophy, using Cyrillic for the page title and text.
When I ran MediaWiki2DokuWiki from the command line the page shows up as:
Processing Десять_базовых_принципов_Google...
This was after checking that PuTTY was using UTF-8 encoding, running it under the default encoding of ISO-8859-1:1998 (Latin-1, West Europe) produced the following output from the conversion process:
Processing ÐеÑÑÑÑ_базовÑÑ
_пÑинÑипов_Google...
Regardless of whichever encoding PuTTY had, when I checked DokuWiki afterward the page displayed correctly.
So I'm not sure where the problem may be. Did the characters of the page in DokuWiki display correctly?
No, the DokuWiki produces the same "????_??????".
This is a common problem in PHP. In other self-made projects I often have to set default charset right after MySQL connection is established.
Probabbly it depends on MySQL collation settings, which is not UTF-8 by default. Some people recommend to tweak MySQL setting to:
[mysqld]
init_connect=‘SET collation_connection = utf8_unicode_ci'
character-set-server = utf8
collation-server = utf8_unicode_ci
[client]
default-character-set = utf8
But in my case (Ubuntu, standard MySQL setting from ports) I did not change the setting but prepare PHP-script for correct working (as described in #1 post).
In your case:
Processing ÐеÑÑÑÑ_базовÑÑ
_пÑинÑипов_Google...
looks like the database collation was correct UTF-8. It is just a concole encoding problem.
But standart MySQl install assume that you use Latin rather than UTF-8 queries.
Tnx.
Interesting. I see no reason why forcing UTF-8 character set is a bad thing -- for most it will have no effect and will fix issues such as this one for some. I noticed no difference in my testing. Please give the latest code a try.
I can confirm that problem is gone now:
Processing Дзеркалювання_томів_резервних_копій_Bacula...
Processing Додавання_нового_VPN_сервера...
Thank you!
Hi!
All Cyrillic characters in the titles converted to question marks:
My database collation is "utf8_general_ci". All tables are "utf8_general_ci".
Dirty hack was (Environment.php):
According to this: http://stackoverflow.com/questions/4361459/php-pdo-charset-set-names?answertab=votes#tab-top .