radareorg / radare2

UNIX-like reverse engineering framework and command-line toolset
https://www.radare.org/
GNU Lesser General Public License v3.0
20.39k stars 2.98k forks source link

Revise ps commands and kill r_str_utf16_encode() #12260

Open thestr4ng3r opened 5 years ago

thestr4ng3r commented 5 years ago

r_str_utf16_encode() looks very wrong to me. All it does is go through the string byte by byte and print non-printable bytes as \u00xx, which is nonsense, because it corresponds to no real encoding and especially not to utf-16.

This function is especially used in some ps commands, in particular psu, which I have no idea what purpose it should have. If you know, please tell me.

Also, from just quickly looking at r_print_string(), I think there is a lot wrong here too. The seek parameter is not used at all and if R_PRINT_STRING_WIDE is passed, the only difference is that every second byte is skipped. What kind of an encoding is that supposed to be?

For ps, I would suggest a command format like ag has. I.e. you have multiple commands that read strings with different encodings and print it in different formats, for example:

psr[format]    print raw string (just interpret bytes as unicode indices)
psu[format]    print utf-8 string
psw[format]    print utf-16 string
psW[format]    print utf-32 string
...

Output formats:
<blank>    utf-8
j          json
...

So this means for example psw will READ a zero-terminated utf-16 string from the current seek and PRINT it as utf-8. For non-zero terminated strings, one could for example pass the size as an arg.

XVilka commented 5 years ago

Note that utf16 and utf32 are endian-depended and also might need BOM.

On Thu, Nov 22, 2018, 2:22 AM Florian Märkl <notifications@github.com wrote:

r_str_utf16_encode() looks very wrong to me. All it does is go through the string byte by byte and print non-printable bytes as \u00xx, which is nonsense, because it corresponds to no real encoding and especially not to utf-16.

This function is especially used in some ps commands, in particular psu, which I have no idea what purpose it should have. If you know, please tell me.

Also, from just quickly looking at r_print_string(), I think there is a lot wrong here too. The seek parameter is not used at all and if R_PRINT_STRING_WIDE is passed, the only difference is that every second byte is skipped. What kind of an encoding is that supposed to be?

For ps, I would suggest a command format like ag has. I.e. you have multiple commands that read strings with different encodings and print it in different formats, for example:

psr[format] print raw string (just interpret bytes as unicode indices) psu[format] print utf-8 string psw[format] print utf-16 string psW[format] print utf-32 string ...

Output formats:

utf-8 j json ... So this means for example psw will READ a zero-terminated utf-16 string from the current seek and PRINT it as utf-8. For non-zero terminated strings, one could for example pass the size as an arg. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub , or mute the thread .
Maijin commented 5 years ago

Maybe something @kazarmy wants to take a look at?^

kazarmy commented 5 years ago

What's the difference between psr and psu?