kevoreilly / CAPEv2

Malware Configuration And Payload Extraction
https://capesandbox.com/analysis/
Other
1.93k stars 413 forks source link

lib/cuckoo/common/integrations not pulling from community repo #1337

Closed vitalgingerbread closed 1 year ago

vitalgingerbread commented 1 year ago

Hello

I was encountering an issue where PDF parsing broke on a recent pull. From my investigation it became apparent that PDFID/pdftools was moved in a recent commit. When community.py runs, it does not use the lib/cuckoo/common/* folder at all. As a result, the following files are not copied down.

If this bug can be confirmed and you're happy with this fix (assuming I do it more robustly), I will submit a PR.

My Fix

I fixed this for myself by adding a 'folder' for integrations in community.py as follows:

folders = {
        "feeds": "modules/feeds",
        "signatures": "modules/signatures",
        "processing": "modules/processing",
        "reporting": "modules/reporting",
        "machinery": "modules/machinery",
        "analyzer": "analyzer",
        "data": "data",
        "integrations":"lib/cuckoo/common/integrations",
  }

I then added this to the "all" invocation by appending it to the end of the array:

    if args.all:
        enabled = ["feeds", "processing", "signatures", "reporting", "machinery", "analyzer", "data", "integrations"]

To make this fix more comprehensive a method for just adding the integrations, or even the whole lib/cuckoo/common folder would need to be added in addition to adding this to the "all". I am happy to do this if you're in agreement.

Prerequisites

Please answer the following questions for yourself before submitting an issue.

Expected Behavior

On a fresh install, the required pdftools folder is copied down from the community repo when community.py runs.

Current Behavior

The pdftools folder is not copied down

Failure Information (for bugs)

This will fail whenever a PDF is submitted or when the processor starts up. However, it fails silently. By default this try except doesn't produce any output, so I threw in a print for my own debugging:

try:
    HAVE_PDF = True
    from lib.cuckoo.common.integrations.pdftools.pdfid import PDFiD, PDFiD2JSON
except ImportError:
    print("help! I lost pdftools")
    HAVE_PDF = False

Steps to Reproduce

Please provide detailed steps for reproducing the issue.

An alternative way to reproduce is as follows:

poetry run python  community.py -a | grep -i integrations

Context

I'm not sure context is relevant here, I am using the base install not the all install. But I think the issue is with the way community.py parses the master.tar.gz archive file. Happy to provide context if it is required.

Failure Logs

The easiest way to tell you're impacted is to see if parsing pdfs results in a pdf object in the report.json. You can confirm the same check in the UI too by checking if the PDF button is clickable:

pdf-button
github-actions[bot] commented 1 year ago

@vitalgingerbread: hello! :wave:

This issue is being automatically closed because it does not follow the issue template.

This is open source project! So please apreciate our time that we sacrify from other thing that we could enjoy, instead of asking boring things over and over.

doomedraven commented 1 year ago

the pull request with solution would be useful. will add that thanks for headups

doomedraven commented 1 year ago

fixed

vitalgingerbread commented 1 year ago

Ah, legend, thanks a bunch @doomedraven!