radareorg / ideas

4 stars 1 forks source link

Support for custom text encoding #209

Open gingerbeardman opened 6 years ago

gingerbeardman commented 6 years ago

See: https://github.com/radareorg/cutter/issues/171

Related: https://github.com/radare/radare2/issues/414 and https://github.com/radare/radare2/issues/2032

Here's a custom encoding definition .ucm file and resulting conversion .cnv file.

custom-encoding.zip

This is ICU spec, UCM format mapping: http://userguide.icu-project.org/conversion/data#TOC-.ucm-File-Format

Here's how I created the .ucm mapping and generated the .cnv file: https://www.ibm.com/support/knowledgecenter/en/SSMKHH_9.0.0/com.ibm.etools.mft.doc/ac30170_.htm

I use this with Synalyze It! Pro

Strings I am working with are encoded using custom alphabet (missing unused characters) and custom ASCII offsets.

here's the UCM file

<code_set_name>               "Custom"
<char_name_mask>              "AXXXX"
<mb_cur_min>                  1
<mb_cur_max>                  1
<uconv_class>                 "SBCS"
#
CHARMAP
#
#
#ISO 10646  Custom
#_________  _________
<U0020>     \x20 |0
<U0021>     \x21 |0
<U0022>     \x22 |0
<U0023>     \x23 |0
<U0024>     \x24 |0
<U0025>     \x25 |0
<U0026>     \x26 |0
<U0027>     \x27 |0
<U0028>     \x28 |0
<U0029>     \x29 |0
<U002A>     \x2A |0
<U002B>     \x2B |0
<U002C>     \x2C |0
<U002D>     \x2D |0
<U002E>     \x2E |0
<U002F>     \x2F |0
<U0030>     \x30 |0
<U0031>     \x31 |0
<U0032>     \x32 |0
<U0033>     \x33 |0
<U0034>     \x34 |0
<U0035>     \x35 |0
<U0036>     \x36 |0
<U0037>     \x37 |0
<U0038>     \x38 |0
<U0039>     \x39 |0
<U003A>     \x3A |0
<U003B>     \x3B |0
<U003C>     \x3C |0
<U003D>     \x3D |0
<U003E>     \x3E |0
<U003F>     \x3F |0
<U0040>     \x40 |0
<U0041>     \x41 |0
<U0042>     \x42 |0
<U0043>     \x43 |0
<U0044>     \x44 |0
<U0045>     \x45 |0
<U0046>     \x46 |0
<U0047>     \x47 |0
<U0048>     \x48 |0
<U0049>     \x49 |0
<U004A>     \x4A |0
<U004B>     \x4B |0
<U004C>     \x4C |0
<U004D>     \x4D |0
<U004E>     \x4E |0
<U004F>     \x4F |0
<U0050>     \x50 |0
<U0051>     \x51 |0
<U0052>     \x52 |0
<U0053>     \x53 |0
<U0054>     \x54 |0
<U0055>     \x55 |0
<U0056>     \x56 |0
<U0057>     \x57 |0
<U0058>     \x58 |0
<U0059>     \x59 |0
<U005A>     \x5A |0
<U005B>     \x5B |0
<U005C>     \x5C |0
<U005D>     \x5D |0
<U005E>     \x5E |0
<U005F>     \x5F |0
<U0060>     \x60 |0
<U0061>     \x61 |0
<U0062>     \x62 |0
<U0063>     \x63 |0
<U0064>     \x64 |0
<U0065>     \x65 |0
<U0066>     \x66 |0
<U0067>     \x67 |0
<U0068>     \x68 |0
<U0069>     \x69 |0
<U006A>     \x6A |0
<U006B>     \x6B |0
<U006C>     \x6C |0
<U006D>     \x6D |0
<U006E>     \x6E |0
<U006F>     \x6F |0
<U0070>     \x70 |0
<U0071>     \x71 |0
<U0072>     \x72 |0
<U0073>     \x73 |0
<U0074>     \x74 |0
<U0075>     \x75 |0
<U0076>     \x76 |0
<U0077>     \x77 |0
<U0078>     \x78 |0
<U0079>     \x79 |0
<U007A>     \x7A |0
<U007B>     \x7B |0
<U007C>     \x7C |0
<U007D>     \x7D |0
<U007E>     \x7E |0
<U007F>     \x7F |0
<U30E8>     \xA0 |0
<U30E1>     \xA1 |0
<U30DE>     \xA2 |0
<U30A7>     \xA3 |0
<U30E3>     \xA4 |0
<U56DE>     \xA5 |0
<U3092>     \xA6 |0
<U203C>     \xA7 |0
<U3058>     \xA8 |0
<U30C9>     \xA9 |0
<U3002>     \xAA |0
<U2026>     \xAB |0
<U3083>     \xAC |0
<U3085>     \xAD |0
<U3087>     \xAE |0
<U3063>     \xAF |0
<U26D4>     \xB0 |0
<U3042>     \xB1 |0
<U3044>     \xB2 |0
<U3046>     \xB3 |0
<U3048>     \xB4 |0
<U304A>     \xB5 |0
<U304B>     \xB6 |0
<U304D>     \xB7 |0
<U304F>     \xB8 |0
<U3051>     \xB9 |0
<U3053>     \xBA |0
<U3055>     \xBB |0
<U3057>     \xBC |0
<U3059>     \xBD |0
<U305B>     \xBE |0
<U305D>     \xBF |0
<U305F>     \xC0 |0
<U3061>     \xC1 |0
<U3064>     \xC2 |0
<U3066>     \xC3 |0
<U3068>     \xC4 |0
<U306A>     \xC5 |0
<U306B>     \xC6 |0
<U306C>     \xC7 |0
<U306D>     \xC8 |0
<U306E>     \xC9 |0
<U306F>     \xCA |0
<U3072>     \xCB |0
<U3075>     \xCC |0
<U3078>     \xCD |0
<U307B>     \xCE |0
<U307E>     \xCF |0
<U307F>     \xD0 |0
<U3080>     \xD1 |0
<U3081>     \xD2 |0
<U3082>     \xD3 |0
<U3084>     \xD4 |0
<U3086>     \xD5 |0
<U3088>     \xD6 |0
<U3089>     \xD7 |0
<U308A>     \xD8 |0
<U308B>     \xD9 |0
<U308C>     \xDA |0
<U308D>     \xDB |0
<U308F>     \xDC |0
<U3093>     \xDD |0
<U3099>     \xDE |0
<U309A>     \xDF |0
<U30A2>     \xE0 |0
<U30A4>     \xE1 |0
<U30A6>     \xE2 |0
<U30AA>     \xE3 |0
<U30AD>     \xE4 |0
<U30AF>     \xE5 |0
<U30B3>     \xE6 |0
<U30B7>     \xE7 |0
<U30B9>     \xE8 |0
<U30BB>     \xE9 |0
<U30BF>     \xEA |0
<U30C1>     \xEB |0
<U30C8>     \xEC |0
<U30CA>     \xED |0
<U30CF>     \xEE |0
<U30D2>     \xEF |0
<U30D5>     \xF0 |0
<U30DB>     \xF1 |0
<U30DF>     \xF2 |0
<U30E0>     \xF3 |0
<U30E4>     \xF4 |0
<U30E9>     \xF5 |0
<U30EA>     \xF6 |0
<U30EB>     \xF7 |0
<U30EC>     \xF8 |0
<U30ED>     \xF9 |0
<U30EF>     \xFA |0
<U30F3>     \xFB |0
<U30C3>     \xFC |0
<U30E7>     \xFD |0
<U30C6>     \xFE |0
<U30FC>     \xFF |0
END CHARMAP
ret2libc commented 4 years ago

This issue has been moved from radareorg/radare2 to radareorg/ideas as we are trying to clean our backlog and this issue has probably been created a long while ago. This is an effort to help contributors understand what are the actionable items they can work on, prioritize issues better and help users find active/duplicated issues more easily. If this is not an enhancement/improvement/general idea but a bug, feel free to ask for re-transfer to main repo. Thanks for your understanding and contribution with this issue.

gingerbeardman commented 4 years ago

That's a shame, but I understand.

trufae commented 4 years ago

It’s probably a mistake. This issue has been in r2 from almost the begining but hasnt been addressed until now that there’s an ongoing PR addressing it. Imho this issue is important and must be moved back to r2