jpatokal / mediawiki-gateway

Ruby framework for MediaWiki API manipulation
Other
133 stars 50 forks source link

UTF-8 support @xml parsing in Ruby 1.9.2 #12

Closed balor closed 13 years ago

balor commented 13 years ago

When executing a create() method the Gateway is checking if the document with given name exists. If the name contains pl characters (ex. "Więzadło") the response from the wikimedia api xml response is not parsed because it also contains pl characters in the page title attr. Because of that, REXML throws Encoding::CompatibilityError. This exception is handled so Gateway tells me that the response is not XML. Using force_encoding('UTF-8') on the response before creating a REXML document helped.

jpatokal commented 13 years ago

You're getting this error because you're using Ruby 1.9 and your default encoding is not set to UTF-8. You can fix this on the OS level with export LC_CTYPE='UTF-8', or in Ruby with Encoding.default_external = 'UTF-8'. I'm checking with Stack Overflow for good ideas for how to make the code work nicely for both 1.8 and 1.9:

http://stackoverflow.com/questions/5386920/handling-string-encoding-with-the-same-code-in-ruby-1-8-and-1-9

jnv commented 13 years ago

Maybe it's for a separate report, but I've found a similar issue when uploading files with name containing non-ASCII characters. The funny thing is that the exception is caused by debug output of request:

incompatible character encodings: ASCII-8BIT and UTF-8
[...]/lib/media_wiki/gateway.rb:678:in `inspect'
[...]/lib/media_wiki/gateway.rb:678:in `make_api_request'
[...]/lib/media_wiki/gateway.rb:431:in `upload'

The actual line corresponds to my fork, in the current official tree it's 572

The exception is raised by form_data.inspect. I was trying to #force_encoding("UTF-8") on all form_data items with no success, but I think the problem is somehow related to 'file' item (which is, as a File object for a binary file, ASCII-8BIT).

Currently the easiest workaround is to comment out the offending line.

MRI 1.9.2-p290

jpatokal commented 13 years ago

I've implemented your fix for the original issue. Can you raise the uploading file name issue separately?