jfernandz / pyst2

A fork of the famous python pyst library for Asterisk.
136 stars 102 forks source link

agi.verbose("string") bug #19

Open Scinawa opened 8 years ago

Scinawa commented 8 years ago

File "/opt/agi/unknow", line 13, in agi.verbose("python agi started") File "/usr/local/lib/python3.4/dist-packages/pyst2-0.4.7-py3.4.egg/asterisk/agi.py", line 606, in verbose self.execute('VERBOSE', self._quote(message), level) File "/usr/local/lib/python3.4/dist-packages/pyst2-0.4.7-py3.4.egg/asterisk/agi.py", line 121, in _quote return ''.join(['"', string.encode('ascii', 'ignore'), '"']) TypeError: sequence item 1: expected str instance, bytes found

tuxpowered commented 8 years ago

What is the variable (data) you are passing into it? You need to pass a string to the function. The _quote function will convert int and float to strings automatically, any other objects passed must be of a type that can process string.encode() meaning they need to be a string.

Is it possible you are passing an object reference ?

eill commented 8 years ago

you are trying to join together string and byte objects. Oops, you've just dropped the python3 support.

In [6]: ''.join(['"', 'teststr'.encode('ascii', 'ignore'), '"'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-9e7ac0978072> in <module>()
----> 1 ''.join(['"', 'teststr'.encode('ascii', 'ignore'), '"'])

TypeError: sequence item 1: expected str instance, bytes found

did you ever test your own patches?

tuxpowered commented 8 years ago

Eill, Asterisk AGI works by standard in/out. And this stdin/out is expected to be plain text. The previous code converted everything passed to it to text str(). This had the side effect of if you passed a string that contained string encodings, as shown below, that the data would not be translated

r = '\xc2\xae' # The ® trademark symbol.

Because python can not determine the actual encoding used on the string, this can not be converted. Furthermore it can cause your asterisk application to crash. The code change assures that only valid 'ascii' text is sent back to asterisk to be processed, thus preventing dialplan errors and crashes.

Enter Python 3. Unfortunately strings in python 3 are not really strings. they are byte strings, in Python 2 we have str and unicode, and unicode is not the same as byte strings. There is many more articles about the differences documented elsewhere on the web, which is better at explaining them.

The issue comes from python3 when it performs the encoding it returns a byte string, and in python 2 it returns an ascii string.

Finally, if you care to look at the first line of the agi.py file. this module is designed to work with python2 not python3.

I am happy to review alternative methods, if you present them. BUT do keep in mind that things should remain compatible with python2.

As to your question "did you ever test your own patches?", yes, obviously I have. Under the conditions that the script is designed to work, using the correct python interpreter, this functions correctly. Did you ever run them under the supported interpreter?

Is PyST2 compatible with Python3? No. Not fully. It's getting there.

Can you submit a test condition and code to be reviewed? Absolutely, and I would be happy to run tests against it. Keep in mind the code should remain backward compatible with python2 installations.

eill commented 8 years ago

this module is designed to work with python2 not python3

that's wrong, I've added python3 support a year ago (and the module works with python3 fine). Yes, I forgot to change the shebang, but it does not matter here (you almost never directly execute the agi.py file but import it).

for all, could you tell me what's the purpose of function _quote? Let's see the code (I've omitted the isinstance checks for clarity):

Python 2.7.5 (default, Mar  9 2014, 22:15:05) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> string='®'
>>> def quote(string):
...     return ''.join(['"', string.encode('ascii', 'ignore'), '"'])
... 
>>> quote(string)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in quote
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)

if you want to dispose of exceptions (according to your commit cef87f4241fb19789a36f45df2f8ebdd51f835e2 comment) you've done it wrong, because exceptions are still here.

so code needs to be rewritten here.

eill commented 8 years ago

ok, I've read again your comment and I see that you just want not to pass the unicode strings to asterisk. I wish I finally understood your logic and I'll try to rewrite this code as soon as possible.

tuxpowered commented 8 years ago

The purpose of _quote() is to prevent illegal characters are passed back over AGI.

Python 2.7.9 (v2.7.9:648dcafa7e5f, Dec 10 2014, 10:10:46)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> string='®'
>>> escaped_string = string.decode('utf-8')
>>> def _quote(string):
...     return ''.join(['"', string.encode('ascii', 'ignore'), '"'])
...
>>> _quote(string)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in _quote
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)

string here can not be encoded to 'ascii' because the string is encoded. You would need to decode this first.

>>> _quote(escaped_string)
'""'

Passing the decoded string to _quote functions correctly.

>>> string
'\xc2\xae'

Python 2 automatically provides the output of string as the escaped characters. While Python3 does not.

>>> str(string)
'\xc2\xae'
>>>

The old code would have taken executed the above returning escaped characters under Python2 and a utf-8 (or whatever encoded character) under Python3. There is also no .decode() method on the string in python3 which adds to the problem.

Under python2 converting of values such as '®' do not have an ascii equivalent and therefore are ignored. It would be considered reasonable to expect that a user would not want '\xc2\xae' to be the data returned to asterisk.

The problem becomes more interesting when add things like JSON. Take for example the following under python3

>>> import json
>>> string
'®'
>>> j = { 'char': string}
>>> j
{'char': '®'}
>>> json.dumps(j)
'{"char": "\\u00ae"}'

and under python2

>>> string='®'
>>> import json
>>> j = {'char': string}
>>> j
{'char': '\xc2\xae'}
>>> json.dumps(j)
'{"char": "\\u00ae"}'

No shock here as python transforms each as expected. But note what happens when it is stored as JSON. We get yet another encoding. \u00ae

So lets assume for example that your AGI application has made a call to some web service, or maybe a DB that is returning a JSON payload, something that is a rather common task. If the data contains our sample string, it would have been converted, to the cryptic \u00ae value.

The new code change will ignore this char.

>>> json.loads(json.dumps(j))['char'].encode('ascii', 'ignore')
''

While the older code, converting all values to strings would have crashed.

>>> str(json.loads(json.dumps(j))['char'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in position 0: ordinal not in range(128)

This is what the code change is intended to prevent.

Moving Forward We can not just convert everything to str() this would error as seen above on python2. while python3 understands it. So we need a better solution, I agree. We want to support both.

Admittedly I did not do a lot of testing on python3, as python2 is still the default on distributions. (I am at least not aware of a python3 as default on any distro).

I think the only way to make this work is to do some additional condition checking. Perhaps we attempt to convert the value first and capture the exception error, and then try to convert to str() in Python3. In that way it can support python2/3 as it appears with python3 if you force it to str() you can pass .encode() to it.

>>> str(json.loads(json.dumps(j))['char'])
'®'
>>> str(json.loads(json.dumps(j))['char']).encode('ascii', 'ignore')
b''

Okay I hope that gave enough examples and use case to help paint a better picture. I will likely do some testing this weekend on this as well and welcome your feedback.

ghost commented 7 years ago

Just wondering if there were any more thoughts on this? Given that Python 3 is almost a decade old now... also this is the subject of several issues and pull requests, eg:

https://github.com/rdegges/pyst2/commit/5f37f3ee6132b69e9502e601dc9a9b556d9dfb74 https://github.com/rdegges/pyst2/pull/28 https://github.com/rdegges/pyst2/issues/30

In my case, I just ended up doing this:

def _quote(self, string):
    joined = '"' + string + '"'
    return joined`

So far, it SEEMS to be OK. What might bite me here?

In fact, I even tried a little test, pushing some weird characters back from pyst2 to Asterisk. Output looks OK...

<PJSIP/6001-00000009>AGI Tx >> 200 result=1
<PJSIP/6001-00000009>AGI Rx << VERBOSE "ʞsᴉɹǝʇsɐ uᴉ ǝuᴉɟ sʞɹoʍ puɐ uʍop ǝpᴉsdn sᴉ sᴉɥʇ" 1
 chartest.py: ʞsᴉɹǝʇsɐ uᴉ ǝuᴉɟ sʞɹoʍ puɐ uʍop ǝpᴉsdn sᴉ sᴉɥʇ
<PJSIP/6001-00000009>AGI Tx >> 200 result=1
    -- <PJSIP/6001-00000009>AGI Script chartest.py completed, returning 0
kevin-olbrich commented 5 years ago

This indeed works but better make sure the other types are converted:

    def _quote(self, string):
        """ provides double quotes to string, converts int/bool to string """
        if isinstance(string, int):
          string = str(string)
        if isinstance(string, float):
          string = str(string)
        joined = '"' + string + '"'
        return joined

Or better !isinstance(string, str) which will fail in cases, where casting will not make sense.