qbwc / qbxml

QBXML Parser and Validation Tool
MIT License
27 stars 41 forks source link

Guidance on handling non-ASCII characters #8

Closed rchekaluk closed 8 years ago

rchekaluk commented 10 years ago

Do you have guidance on how to properly handle the case where qbXML contains non-ASCII and/or reserved characters as mentioned on How does qbXML handle special characters? (this is easy to do, for example, in fields Name, FirstName, LastName, CompanyName)

Is there some readily available Nokogiri::HTML or Nokogiri::XML incantation that will cleanse a string prior to submitting it to QBWC? (to avoid error "QuickBooks found an error with the XML")

Is it possible for Qbxml to handle this cleansing automatically, or at least could there be some function provided to do so?

If I understand the process correctly, this test string:

nonascii = 'Ray "you can call mé" <J. johnson>'

Must be transformed into this string prior to submitting it to QBWC:

Ray \"you can call m&#233;\" &lt;J. johnson&gt;
ruckus commented 9 years ago

I ran into this problem and ended up just using the I18n gem:

https://github.com/svenfuchs/i18n

to transliterate accented characters into their ASCII equivalents:

def sanitize_text(text)
   I18n.transliterate(text)
end
rchekaluk commented 9 years ago

Using I18n.transliterate does not appear to convert into XML character entities as requested. For example, this international string:

qbxml_nonascii = "<?xml version=\"1.0\"?><?qbxml version=\"7.0\"?><QBXML><QBXMLMsgsRq onError = \"stopOnError\"><CustomerAddRq><CustomerAdd><Name>téstusér</Name><Salutation>Mr</Salutation><FullName>téstusér thé 空手道 and يحي٦٦ي</FullName><Email>чшка@ик-с-апми.рф</Email></CustomerAdd></CustomerAddRq></QBXMLMsgsRq></QBXML>"

Transliterates into:

<?xml version="1.0" ?><?qbxml version="7.0"?><QBXML><QBXMLMsgsRq onError = "stopOnError"><CustomerAddRq><CustomerAdd><Name>testuser</Name><Salutation>Mr</Salutation><FullName>testuser the ??? and ??????</FullName><Email>????@??-?-????.??</Email></CustomerAdd></CustomerAddRq></QBXMLMsgsRq></QBXML>

However, thanks to http://minimul.com/encoding-a-quickbooks-qbxml-request.html, this code appears to do the job for a ruby string:

Nokogiri::XML(qbxml_nonascii).to_xml(encoding: 'US-ASCII')

And for a ruby hash:

Nokogiri::XML(Qbxml.new(:qb).to_qbxml(hash_nonascii)).to_xml(encoding: 'US-ASCII')

Caveat: although the output of these calls appear to conform to the guidelines in How does qbXML handle special characters, I have not yet gotten the international characters to appear in QuickBooks.

JasonBarnabe commented 9 years ago

transiterate just takes the accents off in an effort to make things look English-y. I imagine encoding non-ASCII may just make it into ASCII that QuickBooks doesn't understand but is fine with.

Can you enter non-ASCII into QuickBook's UI?

ruckus commented 9 years ago

@JasonBarnabe thats correct, it just does a quick and dirty job of turning accented characters into non-accented characters. And it does not convert accented characters into their entity equivalents.

Yes, a user can enter non-ASCII into the QB UI. I just created an account in the QB UI with a name of Cafe Émo and in the XML for a CustomerQueryRq it came back as:

<Name>Cafe &#200;mo</Name>

So you're right, transliterate does handle entity conversion. I probably jumped into this thread prematurely, so my apologies. My use case was for running CustomerAdd jobs with data entered into my UI and when I generated the those accents were causing QB to barf, so I quickly needed a way to fix the issue. And in my case it was OK to replace those characters with their ASCII equivalents.

But I realize this is not a solution for everyone.

rchekaluk commented 9 years ago

I cannot seem to paste non-ASCII non-Latin (i.e. Chinese or Arabic) characters into the QuickBooks UI, even after installing various Windows language packs (but pasting mere accented characters appears to work). QuickBooks turns these non-Latin characters into question marks, similar to these reports:

For the QuickBooks Web Connector (import) case, these reports echo my experience:

I'm guessing that QuickBooks Web Connector indeed imports non-ASCII into the QuickBooks database when they are encoded as above, but QuickBooks itself cannot display them.

I think this means that I need to cleanse or disallow user input of non-Latin (?) characters so they don't pollute the QuickBooks database. Ugh. More pessimism:

JasonBarnabe commented 9 years ago

I would not put too much stock into what those people say. They sound very confused as to the relationship between character sets and fonts.

My testing with QuickBooks 15, Canadian edition:

ISO-8859-1 characters (tested with English, accents as in French or Spanish) are fine. You can enter these in the QuickBooks UI, send requests containing them, and receive responses containing them with no issues.

Non-ISO-8859-1 characters (tested with Chinese, Greek, Cyrillic) don't work in the QuickBooks UI, despite the OS supporting them. In fact, I can enter these characters in the UI and they show up... until I focus another text box, and then they turn into question marks.

Passing non-ISO-8859-1 characters with qbxml results in "QuickBooks found an error with the XML".

Changing qbxml to do builder.instruct!(:xml, :encoding => "ISO-8859-1") (which ends up entity-encoding non-ASCII) results in QuickBooks accepting the request, but the characters again show up as question marks in the QuickBooks UI. When this data is used in responses, it's still question marks (it doesn't go back to its original form).

I would conclude the following:

JasonBarnabe commented 8 years ago

Closing per my previous comment.