lmco / laikaboss

Laika BOSS: Object Scanning System
Apache License 2.0
732 stars 156 forks source link

jshlbrd/explode_rtf #36

Closed jshlbrd closed 1 year ago

jshlbrd commented 8 years ago

This PR adds a basic RTF exploder module. I have a feeling there is more work to be done here in the future, but this is a good first step to get the functionality into the project.

pcexhaust commented 8 years ago

jshlbrd,

I tried to run your module today and received errors. I'm no programmer so take the below with a grain of salt...

Looks like decalage's rtfobj.py rtf_iter_objects is expecting a filename and in your code you're feeding it the buffer (the contents of the RTF). The rtf_iter_objects function only yields back two values yet you're asking for three. Maybe you're using a different version of decalage's code? Also note that in decalage's code, the rtf_iter_objects function says that is a deprecated backward-compatible API. Looks like it has been replaced with a new function called RtfObject.

Anyhow, I changed it up and the below works for me to extract three objects that I put inside of an RTF.

...
import tempfile

class EXPLODE_RTF(SI_MODULE):
    def __init__(self,):
        self.module_name = "EXPLODE_RTF"
        self.TEMP_DIR = '/tmp/laikaboss_tmp'
    def _run(self, scanObject, result, depth, args):
        moduleResult = []
        with tempfile.NamedTemporaryFile(dir=self.TEMP_DIR) as temp_file:
            temp_file_name = temp_file.name
            temp_file.write(scanObject.buffer)
            temp_file.flush()
            for index, obj_data in rtfobj.rtf_iter_objects(temp_file_name):
                # index location of the RTF object becomes the file name
                name = 'index_' + str(index)
                moduleResult.append(ModuleObject(buffer=obj_data, externalVars=ExternalVars(filename='e_rtf_%s' % name)))

            return moduleResult

python laika.py ~/rtf_example.rtf  | jq '.scan_result[]|.filename'
"/user/rtf_example.rtf"
"e_rtf_index_74356"
"e_rtf_index_1178970"
"e_rtf_index_1285880"
decalage2 commented 8 years ago

The new RtfObject API is not released in a stable version of oletools yet, it is still under development.

But my goal is to keep rtfobj backwards compatible with the old API. I will fix rtf_iter_objects to return three objects instead of two, to avoid breaking code that uses it.

decalage2 commented 8 years ago

I fixed rtfobj (latest dev version 0.50) so that it provides a backward compatible API. Now it should work fine with this PR.

marnao commented 7 years ago

@decalage2 One of the problems we've encountered with oletools inside of laikaboss is the extensive use of thirdparty libraries in the project. These can conflict with the globally installed versions of these libraries when another laikaboss module tries to import the globally installed library but the thirdparty version has already been imported elsewhere. Python will use the thirdparty one instead. Since the other laikaboss module is expecting a different version of the library that has a different API, it errors. I forget which thirdparty library had the conflict but if you'd like i could try to come up with something that's repeatable.

I'm not sure if there are any clean ways to isolate your thirdparty imports. Have you seen this happen before? Any ideas?

decalage2 commented 7 years ago

I am using the thirdparty subfolder to avoid having many dependencies, so that oletools runs "out of the box" and can be installed without Internet connection. But since pip is getting more widespread, I might remove thirdparty and use pip requirements.txt instead.

If you find out which dependency causes the conflict, please tell me.