Strip non-ascii in QueryBuilder

jaredmoody commented 7 years ago

I've noticed that the Quickbooks API chokes on non-ascii characters - I was thinking it would be nice if the QueryBuilder either stripped those characters by default or contained some option to do that so I didn't have to manually do that everywhere I'm passing a string to the api.

I'd be willing to contribute a PR if you think that would be valuable.

minimul commented 7 years ago

Absolutely a PR on this would be valuable. Thanks.

drewish commented 7 years ago

I'm not sure stripping makes sense. If you're searching for LastName = 'Włodarski' dropping the ł would get a successful response but would not return the results you want.

For reference here's an example error when searching for that:

Quickbooks::IntuitRequestException: Error parsing query:
    QueryParserError: Invalid content. Lexical error at line 1, column 88.  Encountered: "\u0142" (322), after : "\'W"

drewish commented 7 years ago

Boy their Query API just doesn't seem to like UTF-8. I threw in a stackoverflow question incase anyone else has any ideas: http://stackoverflow.com/questions/42590136/query-quickbook-online-with-non-ascii-characters

ruckus commented 7 years ago

Yes good points. I have been thinking about this issue since it was first posted. One thought is that it might be a bad idea if the library changes data from underneath the user. Which could lead to unexpected results.

For instance, if the user attempts to save a new Customer with a non-ascii character, say containing an umlaut or some other diacritic character and the gem sanitizes that by replacing á with a behind the scenes, but meanwhile the user has expected that Customer to retain its original name.. this is not clear and can be confusing.

I guess I'm learning towards the making the user make all the decisions themselves (thus, at a higher layer) and not doing any smartiness ourselves - just because it could lead to confusion.

jaredmoody commented 7 years ago

I agree that automatically doing something the user doesn't expect is a bad thing.

However, if stripping UTF-8 is the only way to get a query through, it seems like every user is going to have to implement it themselves - so I think either an option (that defaults to off) or a utility method would be of a lot of value here.

drewish commented 7 years ago

I submitted a question to Intuit's support and here's their response. I haven't had a chance to try it out yet:

If you want to query the whole string, you can URL encode the whole query without the non-ASCII character and and then add the URL encoded character to this query. An example of this is: select%20%2A%20from%20Customer%20where%20GivenName%3D%27m%C3%A5na select * from Customer where GivenName='måna' %C3%A5 is URL encoding of my non-ascii character å.

You can also do something like this: select * from Customer where GivenName like 'm%na' and then encode the whole query which will ignore the second character and search for the rest of the characters in that order.

drewish commented 7 years ago

Did a little testing. Looking at it more it seems like the example they chose works because it's an extended ASCII character. I asked for clarification on how to handle UTF-8 characters like 민준 which I'm able to set via the API and view in the UI.

drewish commented 7 years ago

I'd replied with:

I'm able to to replicate the 'å' character you suggested but it looks like it's part of the extended ASCII set and therefore encoded differently. Could you provide an example with a UTF-8 character like the one in my initial query? Or perhaps '민준'? I'm able to submit those values in the XML and API but have not been able to query for them.

The suggestion to replace the character with the % wildcard is interesting but would return additional records such as "Mona" or "Mina".

And yesterday they finally got back to me:

Hello Andrew, We don't support queries with these characters. You will have to encode on your end and then compare characters in a for loop for all customers.

So that's delightful.

I've only been trying it with US companies. On one page in the docs that I can't seem to find again, I'd seen it say that they use encoding for US vs non-US companies.

ruckus / quickbooks-ruby

Strip non-ascii in QueryBuilder #362