songhongji / androguard

Automatically exported from code.google.com/p/androguard
Apache License 2.0
1 stars 0 forks source link

Error in decoding byte array into utf-8 and utf-16 strings #168

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
In class androguard.core.bytecodes.apk.StringBlock, the class methods decode() 
and decode2() incorrectly decode the byte array into a valid unicode string. 
The code currently converts each byte into a unicode character so long as the 
byte is within the valid ascii range.  Any byte value outside of the valid 
ascii range is ignored.  Consequently, the translated unicode string contains 
only ascii values.  

This is incorrect because the byte array is encoded in either utf-8 or utf-16, 
which may contain legitimate code points outside the ascii range.  Each byte 
within the byte array should be appended to the 'data' string without 
modification.  Then the 'data' string can be decoded into a unicode string 
using the appropriate encoding.

A simple solution would be to replace the following lines in both methods:

   t_data = pack("=b", self.m_strings[offset + i])
   data += unicode(t_data, errors='ignore')

with:

   data += pack("=b", self.m_strings[offset + i])

What steps will reproduce the problem?
1. Get an ARSCParser object from an android application containing a 
resources.arsc file with resource string values containing unicode code points 
outside of the ascii range (e.g. russian, chinese, arab code blocks).
2. Write the results of a call to the method get_strings_resources() to a local 
xml file
3. Open the xml file in a web browser and examine the string values

What is the expected output? What do you see instead?
The expected output should display the unicode characters outside of the ascii 
range correctly. Instead, unicode characters outside of the ascii range were 
omitted altogether.

What version of the product are you using? On what operating system?
androguard-1.9 on xubuntu 14.04

Please provide any additional information below.

I applied the proposed solution to the source, re-built and re-installed 
androguard. I re-ran my test android apk and verified that the output xml file 
contains resource string values with the the appropriate unicode characters.

Original issue reported on code.google.com by kristo...@gmail.com on 11 Sep 2014 at 2:16