Closed dpc22 closed 2 weeks ago
Hi @dpc22 ,
This is correctly encoded using the 2 byte UTF-8 sequence: "0xc2 0xa3" in my SOAP client.
Please provide a sample of the input data, the client script you used and how you made sure the client encoded it correctly.
I attach my example Python script which fails (.txt extension required by github)
The equivalent Perl script seems to work:
The only obvious difference is:
$soap->default_ns('urn:sympasoap');
I can't find a direct equivalence to "$soap->default_ns()" in the Zeep library that I am using in Python.
There is: "zeep.set_ns_prefix()", but that takes two arguments.
| set_ns_prefix(self, prefix, namespace)
| Set a shortcut for the given namespace.
The following didn't help:
zeep.set_ns_prefix(None, 'urn:sympasoap');
Afraid that I don't know what SOAP namespaces do, so I'm blundering around in the dark rather.
I'm pretty sure that my Python code was originally derived from: https://pypi.org/project/sympasoap/.
That doesn't seem to do anything with namespaces either.
(Edit to add)
It also has a normalize method which just discards any non-ASCII characters on the GeCOS field before invoking the SOAP add method. Presumably the author ran into the same issue, but didn't come up with a more sensible fix.
https://docs.python-zeep.org/en/master/transport.html#debugging
tells me how to dump the raw XML which is sent to the sympasoap server.
The raw HTTP POST request was:
zeep.transports: HTTP Post to https://test.lists.cam.ac.uk/sympasoap:
<?xml version='1.0' encoding='utf-8'?>
<soap-env:Envelope xmlns:soap-env="http://schemas.xmlsoap.org/soap/envelope/"><soap-env:Body><ns0:add xmlns:ns0="urn:sympasoap"><list>test-dpc22</list><email>dpc99@cam.ac.uk</email><gecos>Test £</gecos><quiet>true</quiet></ns0:add></soap-env:Body></soap-env:Envelope>
We have <xml ... encoding='utf-8'>
The <gecos>
field appears to be correctly encoded as UTF-8: if I send the output to a file and use "od -c", I see the two byte sequence: "0xc2 0xa3" sent by the SOAP client.
0000440 l > < g e c o s > T e s t 302 243
0000460 < / g e c o s > < q u i e t > t
>>> hex(0o302)
'0xc2'
>>> hex(0o243)
'0xa3'
I have a dedicated test server if I can add useful debugging at the server end. The normal Sympa verbose logging didn't tell me anything.
@dpc22, could you please apply #1592 and check if the problem will be solved?
Thank you.
That seems to have fixed the problem on my test server.
I did need to add a patch for src/lib/Makefile.in in order to backport your fix from the GIT repository to the 6.2.72 release tarball given:
rename from src/lib/Sympa/WWW/SOAP/Transport.pm rename to src/lib/Sympa/WWW/SOAP/FastCGI.pm
I will apply the fix to the live system either tomorrow morning or Monday morning.
Duplicate of #1541.
Okay, that seems to have worked on the live system as well. Thanks for your help here!
Version
6.2.72
Installation method
My own rpm, derived from "official" RHEL 9 rpm.
Expected behavior
If someone calls the SOAP "add" method with a GeCOS value which contains non-ASCII characters, the data should be processed as UTF-8.
Actual behavior
The PostgreSQL database back end throws an exception:
Jul 8 09:19:27 lists-2 sympasoap[298198]: err main::#85 > Sympa::WWW::SOAP::Transport::handle#118 > SOAP::Transport::HTTP::CGI::handle#627 > SOAP::Transport::HTTP::Server::handle#459 > SOAP::Server::handle#2844 > (eval)#2878 > (eval)#2893 > Sympa::WWW::SOAP::add#812 > Sympa::Spindle::spin#95 > Sympa::Request::Handler::add::_twist#80 > Sympa::List::add_list_member#3291 > Sympa::DatabaseDriver::PostgreSQL::do_prepared_query#112 > Sympa::Database::do_prepared_query#383 Unable to execute SQL statement "INSERT INTO subscriber_table (subscribed_subscriber, reception_subscriber, update_epoch_subscriber, number_messages_subscriber, date_epoch_subscriber, visibility_subscriber, user_subscriber, comment_subscriber, list_subscriber, robot_subscriber) SELECT ?, ?, ?, ?, ?, ?, ?, ?, ?, ? FROM dual WHERE NOT EXISTS ( SELECT 1 FROM subscriber_table WHERE user_subscriber = ? AND list_subscriber = ? AND robot_subscriber = ? )": (22021) ERROR: invalid byte sequence for encoding "UTF8": 0xa3
"0xa3" is the single byte ISO-8859-1 character "£".
This is correctly encoded using the 2 byte UTF-8 sequence: "0xc2 0xa3" in my SOAP client.
Something has trans-coded UTF-8 to ISO-8859-1, but the database backend is expecting UTF-8.
Steps to reproduce
SOAP client script (written in Python) available on request.
Additional information
I have an unpleasant feeling that this is in some way related to:
https://github.com/sympa-community/sympa/issues/1407
"This behavior seems due to bug (or buggy behavior) of SOAP::Lite".
(We are using the version of SOAP-Lite which ships with RHEL 9, which is: perl-SOAP-Lite-1.27-8.el9.noarch).
If I add a "Encode::_utf8_off($gecos);" to: lib/Sympa/WWW/SOAP.pm:
Then things start to work in the way that I would expect. However it isn't clear to me whether this is a safe or sensible thing to do.