mechatroner / RBQL

🦜RBQL - Rainbow Query Language: SQL-like query engine for (not only) CSV file processing. Supports SQL queries with Python and JavaScript expressions.
https://rbql.org
MIT License
281 stars 13 forks source link

fix an exception which happens while using unicode characters as delimiter #12

Closed veysiertekin closed 5 years ago

veysiertekin commented 5 years ago

When using unicode delimiters following exception happens. This PR addresses this issue.

➜  ~ rbql-py --delim $(echo "\u2063") --encoding utf-8 --policy simple --query "select top 10 a1" --input /<path-to>/test-out.csv.txt 
/usr/local/lib/python2.7/site-packages/rbql/rbql_csv.py:37: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  if delim == 'TAB':
/usr/local/lib/python2.7/site-packages/rbql/rbql_csv.py:39: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  if delim == r'\t':
/usr/local/lib/python2.7/site-packages/rbql/rbql_csv.py:382: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  if input_delim == '"' and input_policy == 'quoted':
/usr/local/lib/python2.7/site-packages/rbql/rbql_csv.py:384: UnicodeWarning: Unicode unequal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  if input_delim != ' ' and input_policy == 'whitespace':
Error [unexpected]: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

(this file contains an invisible character as a seperator, \u2063) test-out.csv.txt

veysiertekin commented 5 years ago

Hi @mechatroner ,

You are right, this error happens using with python2 instead of python3. We are using invisible separator a lot, since other fields in CSV may contains any character visible!

I have made changes at run_with_python method for the delimiter as you mentioned 👍

mechatroner commented 5 years ago

Merging as is, since it fixes the problem. Will adjust the code later to better conform with the current architecture. Thank you @veysiertekin !