thephpleague / omnipay-sagepay

Sage Pay driver for the Omnipay PHP payment processing library
MIT License
54 stars 78 forks source link

SagePay is strictly ISO8859-1 #13

Closed judgej closed 8 years ago

judgej commented 10 years ago

SagePay does not parse UTF-8 characters. A city of "Foo£" will appear on the SagePay site as "Foo£".

All data sent to SagePay should be converted to ISO8859-1/latin-1 if it is in a UTF-8 or other format to start with. Where is it converted, and what has control of that, is another question.

Yes, they are a little bit behind the times. And no, you won't find this in any documentation.

My recommendation would be to check if the data is valid UTF-8, and is not simple ASCII (7-bits). If it is, then do a utf8_decode(). It it is not valid UTF-8, then just pass it over as-is. That is probably the best that can be done.

judgej commented 10 years ago

Just to note, every shop and his dog that use SagePay, send UTF-8 to it. This plays havoc with "smart-quotes" in addresses and international names, and look awful when the addresses are displayed and emailed out from SagePay.

I don't know if there is a smarter encoding converter than utf8_decode() available, but if characters were intelligently converted, so "èéêë" becomes "eeee" and not "????", that would be great.

judgej commented 9 years ago

I'm still hoping someone will prove to me that I am wrong on this one, because not supporting UTF-8, and remaining totally silent about it in all documentation and articles, feels just a little crazy to me.

judgej commented 9 years ago

I read somewhere that UTF-16BE works with SagePay. I'll investigate and see if that is true.

If this does work, then an approach could be that if the data to send to SagePay validates as UTF-8, then it can be converted to a UTF-16BE encoding before POSTing it. If it does not validate as UTF-8 then it must be assumed to be ISO8859-1 and sent as-is.

Whether the field lengths that SagePay accepts are characters or bytes, is something to see.

There is also a language field in protocol 3.00 that accepts a two-character language code. I'll try throwing in some encoding into that field too, e.g. "en;utf8" and see if that goes through. If there is any i18n support in SagePay, it is certainly undocumented.

judgej commented 9 years ago

Going to the SagePay Server payment screen (where you enter the credit card details and confirm your address) the encoding of the page is "Western". This is in the header:

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />

The latest documentation has a section entitled "Character Sets and Encoding" and fails entirely to mention anything about the subject of character sets or encoding. SagePay is expanding into Europe, but so far has hit only countries that are covered by this encoding. It will have to change at some point, but I have no idea when.

greydnls commented 9 years ago

I'm okay with utf8_decode()'ing before we send it if that will fix the issues that are happening with the ugly decoding. I don't know of a better way to do it than that.

judgej commented 9 years ago

Just started doing some work on this, and noticed that SagePay may have actually made a change here.

Submitting UTF-8 address details, SagePay seems now to be converting it into ISO-8859-1. Any character outside of the Western extended ASCII range is converted to a "?". Any invalid UTF-8 sequence is simply stripped out. That is very different to earlier in the year, where it would be treated as Western regardless of what characterset Guzzle was posting. I'll run a few more tests to make sure it is really doing this, and it is not some artefact of my editor. If this an update, then it's great news, and shows the API is moving forward, and that the gateway is at least recognising the characterset being posted, even if it is still Western extended ASCII across its back-end (I can see that changing TBH). It has certainly been slipped in quietly though.

Another thing I noticed when posting a PAYMENT against my test account, is that SagePay was returning a status of PENDING - "2014 : The Transaction was Registered Successfully." That's a but worrying, because "PENDING" is not documented in this context. It should be "OK". Now, I personally set the status at this stage to "PENDING" in my applications (and always have), because IMO "OK" is not the right thing to store - it can get mixed up with the "OK" in the notification callback when the transaction is finally paid. I wonder if SagePay have finally realised this and have changed the returned status, but it ain't documented. I'll check the SagePay site to see if any additional documentation has been released. It could be they are experimenting in the test systems for now.

judgej commented 9 years ago

The auto-convert of charactersets is not in SagePay production yet. Submitting this as an address1:

Další věcí, přemýšlet

gives me this in the payment screen:

Další vÄ?cí, pÅ?emýšlet

Yuk. While against the SagePay test systems I see the same string as I post (not the same as live), albeit now as ISO-8859-1. In both cases I am using the latest OmniPay, OmniPay-SagePay and v3.00 of the SagePay protocol. So we have protocol v3.00 and protocol v3.00 and they behave differently. Great :-/

judgej commented 9 years ago

I'll hold off development on this for a while, to see what happens. One thing that have come out of this, is that SagePay have been (and still are on live) ignoring the charset of the POST being made. But on test, it IS taking the POST charset into account and doing a conversion. If we do a utf8_decode() after checking that the input data really is UTF-8, then we MUST also set the charset header in guzzle to indicate that ISO-8859-1 is being sent (though not a clue how to do that - it is not as though Guzzle has a "set charset" option, and I can find no clues in the docs), so SagePay does not attempt to do a double convert, resulting in the fields being truncated or blanked out due to the data not really being UTF-8.

judgej commented 8 years ago

I'm going to close this, as it looks like Sage Pay now uses UTF-8 transport (over HTTP). They still use an extended ASCII encoding internally, even for their newest REST API, but at least they now do a sensible conversion if sent as UTF-8. That moves the problem on, and out of OmniPay.