Question regarding processing modules/parsers?

WiltedDeath commented 10 months ago

Hi, I have a new goal which is to implement/create a processing module/parser to be able to parse JSON data from samples that are being uploaded to Cape. I have added PE sieve as an auxiliary and my code is perfect. Now the only I must do is basically parse the 2 JSON reports it generates, and I want to parse the data into a HTML page.

That is okay, but my question is if the processing modules get run during the scan? If that is correct, can I make the parser JSON script to interact with the current working directory with Path.cwd(), look for the 2 JSON reports, parsing them, and can I upload the report of the parsing using upload_to_host to the analyses' folder?

pe sieve auxiliary code:

import time
import os
import logging
import subprocess
from threading import Thread
from pathlib import Path
from lib.common.abstracts import Auxiliary
from lib.common.results import upload_to_host

log = logging.getLogger(__name__)

class PESieve(Auxiliary, Thread):
    def __init__(self, options, config):
        Auxiliary.__init__(self, options, config)
        Thread.__init__(self)
        self.pesieve_path = "C:\\Users\\CapeUser\\Desktop\\pesieve\\pe-sieve64.exe"
        self.pids = []  # List to track PIDs
        self.processed_pids = set()

    def add_pid(self, pid):
        """Add a PID to the tracking list."""
        if pid not in self.pids:
            self.pids.append(pid)
            log.info("Added PID: %s to PESieve", pid)

    def del_pid(self, pid):
        """Remove a PID from the tracking list."""
        if pid in self.pids:
            self.pids.remove(pid)
            log.info("Removed PID from PESieve")

    def run(self):
        log.info("Running PE-sieve on PIDs")

        cwd = Path.cwd()  # Get the current working directory

        while True:
            for pid in self.pids:
                if pid in self.processed_pids:
                    continue  # Skip this PID if it has already been processed

                process = subprocess.Popen([self.pesieve_path, '/pid', str(pid)], shell=False)
                process.wait()
                log.info("PE-sieve run on PID: %s", pid)
                self.processed_pids.add(pid)  # Mark this PID as processed

                time.sleep(15)  # Pause between processing each PID

            if self.processed_pids == set(self.pids):
                self.stop()  # Call stop here to stop the scan

    def stop(self):
        log.info("Stopping PE-sieve.")
        # Wait for all PE-sieve processes to finish

        for pid in self.pids:
            # Find and upload the output files for each PID
            self.upload_output_files(pid)
        log.info("Uploaded files for all PIDs and stopped PE-sieve.")

    def upload_output_files(self, pid):
        # Find the process-specific output directory within the current working directory
        cwd = Path.cwd()  
        process_dir_path = cwd / f'process_{pid}'

        if process_dir_path.exists() and process_dir_path.is_dir():
            files_in_folder = os.listdir(process_dir_path)
            log.info(f"Listing the files inside {process_dir_path}")

            for file_name in files_in_folder:
                file_path = process_dir_path / file_name
                if file_path.is_file():
                    # Use the existing upload function, assuming it takes the file path and destination path
                    upload_to_host(str(file_path), os.path.join("pesieve", file_name))
                    log.info(f"Uploaded {file_name} from {process_dir_path} to CAPE")
        else:
            log.error(f"No output directory found for PID {pid}")

I will give you screenshots and explain. Here the uploading of results before my PE sieve code gets them back to Cape they get stored here: Screenshot 2023-12-12 183736

Then in my PE sieve code I am looking for the cwd and I upload every file in "pesieve" in cape Screenshot 2023-12-13 125414

kevoreilly commented 10 months ago

To answer your question specifically, no, processing is not done during the 'scan' - if by 'scan' you mean performed inside the Windows vm along with the analyzer code.

Processing is done on the server side, within processing modules. Processing is performed by a separate service 'cape-processor'.

https://capev2.readthedocs.io/en/latest/customization/processing.html

WiltedDeath commented 10 months ago

alright thank you so much!

WiltedDeath commented 10 months ago

Hey so I added my custom processing module in Cape by checking every requirement there is:

added to processing.conf as pesieve_parser_json:

Now that you clarified that the processing modules do not get run during the scan then my task will be abit simplified. The goal for the JSON parser is to parse the 2 files that my tool generates after a scan and the auxiliary stores it in storage/analyses/1/pesieve - the folder is created by the resultserver Screenshot 2023-12-19 153412

I even have a code already for the json parse which shouldnt be too difficult, Now I would like to parse the data , make it human readable and display it in HTML page. I want to store the result of the parsing data in the storage/analyses/1/

import os
import json
from lib.cuckoo.common.abstracts import Processing
from lib.cuckoo.common.path_utils import path_exists

class PeSieveJSONParser(Processing):
    """Parse Pe-sieve JSON reports and save results in HTML format."""

    def run(self):
        self.key = "pesieve_reports"
        reports = {}

        # Define paths to the JSON files
        scan_report_path = os.path.join(self.analysis_path, "pesieve", "scan_report.json")
        dump_report_path = os.path.join(self.analysis_path, "pesieve", "dump_report.json")

        # Check and parse the scan report
        if path_exists(scan_report_path):
            with open(scan_report_path, 'r') as file:
                reports["scan_report"] = json.load(file)

        # Check and parse the dump report
        if path_exists(dump_report_path):
            with open(dump_report_path, 'r') as file:
                reports["dump_report"] = json.load(file)

        # Convert parsed data to HTML
        html_report = self.data_to_html(reports)

        # Manually save the HTML report
        html_report_filename = "pesieve_html_report.html"
        html_report_path = os.path.join(self.analysis_path, "pesieve", html_report_filename)
        with open(html_report_path, 'w') as file:
            file.write(html_report)

        # Return the path of the saved HTML report
        return {"html_report": html_report_path}

    def data_to_html(self, parsed_data):
        """Convert parsed data to HTML format."""
        html_content = "<html><head><title>Pe-sieve Report</title></head><body>"

        # Iterate over each report in the parsed data
        for report_type, data in parsed_data.items():
            # Add a section for each report type
            html_content += "<h2>" + report_type + "</h2>"

            # Format the data as JSON and add it to the HTML content
            formatted_json = json.dumps(data, indent=4)
            html_content += "<pre>" + formatted_json + "</pre>"

        html_content += "</body></html>"
        return html_content

Am I doing the uploading correctly or should I use upload_to_host method, or what can you suggest me do?

kevoreilly / CAPEv2

Question regarding processing modules/parsers? #1893