takeshixx / redstar-tools

Tools for Red Star OS (붉은별)
361 stars 35 forks source link

Encoding for AnGae.dat #3

Open seamustuohy opened 8 years ago

seamustuohy commented 8 years ago

I have been fooling around with stripping out the AnGae.dat text stings and have been having difficulty getting them to show up propoerly. What encoding did you use for the final text? I have been using UTF-16 LE, but that has mostly resulted in gibberish strings.

chrisdoman commented 6 years ago

Hi! Did you ever get anywhere with this?

I'm trying to do something similar - I'm attempting to convert AnGae.dat to Yara rules (basically just dump out the hex characters) but it looks like I'm doing something wrong. Attempting to convert to UTF-16 didn't lead to any Korean strings for me.

If you're still looking at this, or anyone else is, the code I attempted to create yara rules with is below ->


#!/usr/bin/env python2
# -*- coding: utf-8 -*-
import sys
import struct
import binascii

f = open(sys.argv[1])

timestamp, = struct.unpack('<I', f.read(4))

unknown1 = f.read(1000)

package_id, = struct.unpack('<I', f.read(4))
unknown2, = struct.unpack('<I', f.read(4))
pattern_date, = struct.unpack('<I', f.read(4))
file_count, = struct.unpack('<I', f.read(4))
head_pos, = struct.unpack('<I', f.read(4))
real_size, = struct.unpack('<I', f.read(4))

pattern_count, = struct.unpack('<Q', f.read(8))

print('timestamp: {}'.format(timestamp))
print('pattern count: {}'.format(pattern_count))

patterns = []

for i in range(pattern_count):
    pattern = dict()
    pattern['reclen'], = struct.unpack('<I', f.read(4))
    pattern['package_id'], = struct.unpack('<I', f.read(4))
    pattern['content'], = struct.unpack('<200s', f.read(200))
    patterns.append(pattern)

pattern_checksum, = struct.unpack('<20s', f.read(20))

count = 0

for pattern in patterns:
    try:
        count = count + 1
        hex = binascii.hexlify(pattern['content'])
        print 'rule kr_yara' + str(count) + ' {'
        print ' strings: '
        print '  $a_' + str(count) + ' = { ' + hex + ' }'
        print 'condition:'
        print ' all of them'
        print '}'
    except Exception as ex:
        pass
SpiraMirabilis commented 5 years ago

I have been fooling around with stripping out the AnGae.dat text stings and have been having difficulty getting them to show up propoerly. What encoding did you use for the final text? I have been using UTF-16 LE, but that has mostly resulted in gibberish strings.

It looks like they are little endian, so switch to bytes and they appear to be UTF-16 strings

willscott commented 5 years ago

@SpiraMirabilis do you have a script you can share for extracting the strings?

Karmakstylez commented 4 years ago

Please share the script that works. Because I cannot make the script working by putting encoding=utf_16_le

Getting error:

UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 1080-1081: illegal encoding

Any help will be apreciated!

@SpiraMirabilis @takeshixx