Hello from CCCS! 🍁 - Githubissues

cccs-rs commented 6 months ago

Hi there,

I was wondering if you're interested in including your extractor in Assemblyline, our open-source malware analysis platform.

I believe adding the work that you've done would be a boon to the cybersecurity community!

If you're interested or having any questions, feel free to reach out! 😀

jeFF0Falltrades commented 6 months ago

So good to hear from you CCCS!

This looks like a great tool, and I’ll make some time to try it out and get familiar.

I’m definitely open to having this parser integrated, so long as we can find a way for any updates to make it downstream to AssemblyLine when the parser gets updated - If anyone there can point me to the best place in the repo to start looking at integration, I can take a look!

cccs-rs commented 6 months ago

Thanks for your response!

I can see this being integrated with the ConfigExtractor service in Assemblyline which allows users to configure sources to pull extractors from periodically and use them for analysis.

The underlying library that "collects" and runs the extractors is configextractor-py which is written to support multiple extractor frameworks in a common tool and outputs to a common standard called MACO which undergoes data validation via Pydantic. The library has official support for MWCP and MACO extractors, still pending on malduck support 🤞🤞.

So I would definitely explore these two Python packages mentioned to make your extractor compatible with Assemblyline via local testing on your host. Then you can try a test with Assemblyline to make sure everything works from there as well.

Also, if you find there's something that the MACO output standard doesn't capture, feel free to let us know!

If you have any other questions about this or Assemblyline in general, feel free to join our Discord for a more immediate response or respond to this issue. 🤓

doomedraven commented 5 months ago

oh hello guys, nice to see CCCS here too, i just added tihs extractors to CAPE https://github.com/kevoreilly/CAPEv2/pull/2135 adding readme mentioning the owner work, is that fine for you @jeFF0Falltrades ?

jeFF0Falltrades commented 5 months ago

oh hello guys, nice to see CCCS here too, i just added tihs extractors to CAPE kevoreilly/CAPEv2#2135 adding readme mentioning the owner work, is that fine for you @jeFF0Falltrades ?

More than fine, especially given you’ve done the hard work already, and also provided the license - well done and thank you very much!

jeFF0Falltrades commented 2 months ago

Hello again @cccs-rs ; I'm finally getting around to looking at this Issue in more detail, and I've looked at MWCP and MACO, and it seems on first glance like these frameworks require fairly specific designation of config value types - in other words, whereas the RAT King Parser purposefully does not try to determine the exact meaning of each field of a configuration in every case (because the order and type of these configuration values can be changed by malware authors and is not guaranteed), it appears these frameworks want to assign each config value to a category as in "URL" or "username" or "Mutex" or "DNS request", etc.

Do you happen to know if MACO's or MWCP's models allow for generic values?

You'll notice in the README for this project that some of the samples' parsed configurations have very clear labels like "Mutex" and "Ports", but others (and these are the more common ones in the wild) obfuscate the config values so that it is not clear which value corresponds to a mutex or port or whatever.

Is there a way to integrate RKP into MACO (or MWCP, though I figure you all are most familiar with MACO 😉) in such a way that it could extract a config with generic values not mapped to specific definitions of these values (this is how CAPE works and makes it fairly easy to integrate with their framework for parsers)?

Hopefully this question makes sense, but if not, please let me know if I can clarify anything.

Thank you again for your interest!

cccs-rs commented 2 months ago

Hey @jeFF0Falltrades!

Both MACO and MWCP support the use of assigning generic values under a catch-all key called other. Speaking on MACO's behalf, we'd rather categorize the data in the model and using the other field as a last resort to ease parsing of the data for systems like Assemblyline which will want to tag important features to propagate forward to other analytical services or to the end user for further triage/investigation.

On the subject of CAPE, we're also looking to add support for their extractors but as you've mentioned, the output is freeform and the intention is to keep the format of the data as close as to how you would see it in the wild, so to speak. On that front, I've spoken with the maintainers to introduce a conversion function within the extractor implementation which will try to transform the data to conform to the MACO format. In this way, we're still maintaining the original format of the extraction for CAPE's and researchers, as well as providing a way to translate the data into something that is more organized for automation to leverage.

At the moment, I've been busy to complete the rest of the conversion function implementations but I think that would be the best way to integrate RKP with CAPE and maintain compatibility with MACO for our use-case of including RKP as part of the Assemblyline project.

Let me know what your thoughts are on this (and if you're willing, you can pickup where I left off with the CAPE feature? 😅). If there's anything you'd like me to clarify then let me know.

Thanks again for still engaging with us on this!

cccs-rs commented 2 months ago

This might provide a clear example of how an automated system like Assemblyline might favor an organized output format like MACO/MWCP as opposed to freeform data.

Sample: https://www.virustotal.com/gui/file/1197176beff05021bab950260d4fd899f11ed1ca9142e5cd46fcbd412452131f

Context: This is one of the CAPE extractors that I ported to MACO last year but my branch is probably significantly outdated. You'll notice that AL will tag the URIs that were extracted as well as score the URI based on how it's being used (in this case, if it's a C2 then it's immediately marked as malicious)

jeFF0Falltrades commented 2 months ago

Thank you @cccs-rs for the thorough response and examples!

Really nice work so far with the translation functions :-)

As an initial step, I could probably at the very least use some regex or fuzzy matching to convert the C2 domains/IPs as you have with the translation functions above.

The other fields parsed by RKP are a lot harder to classify, but this would at least allow for pivoting/classification on C2 infrastructure.

If you think that's a worthwhile endeavor, I'll look into the work you've done and try to adapt the CAPE implementation over in the same way you've done above, extracting the domain/IP from the configs.

cccs-rs commented 2 months ago

I think this is an endeavor worth venturing.

Because of the format of your extractor as compared to what our library supports, I think the best way to integrate RKP is by having an entrypoint that we can automatically detect.

In mind this can be a rkp_maco.py file that implements an instance of the Extractor class but will invoke the RATConfigParser class on run() similar to what CAPE does. After getting the report, we could do that translation to MACO where possible.

If you'd like me to start on a PR to get things going or if you have any questions, let me know!

Example of rkp_maco.py:

from config_parser import RATConfigParser
from maco.extractor import Extractor
from tempfile import NamedTemporaryFile

class RKP(Extractor):
    family = "RAT"
    author = "@jeFF0Falltrades"
    last_modified = "2024-09-04"
    sharing = "TLP:WHITE"
    yara_rule: open('yara_rules/rules.yar').read()

    def run(self, stream: BinaryIO, matches: List[yara.Match]) -> Optional[model.ExtractorModel]:

        with NamedTemporaryFile('w+b') as fh:
            fh.write(stream.read())
            fh.flush()
            report = RATConfigParser(fh.name).report
        if report['config'].startswith("Exception"):
            # Error occurred during extraction
            return
        # Parse report and transform output to MACO...

jeFF0Falltrades commented 1 month ago

@cccs-rs Thank you for your continued engagement and explanation on this!

What you posted above is a perfect example and cleared things up considerably.

I just refactored and updated RAT King Parser to v3.0.0, and it is now able to be installed via pip as both a standalone utility and a module to be used by other libraries, and with that work done, I feel ready to start focusing efforts on this.

To that end, I have one last question:

In which project would the above example file you posted (rkp_maco.py) live?

Do I create a PR in the configextractor-py repo and build out a subdirectory there for RKP to live in, or the assemblyline-service-configextractor repo, or somewhere else?

If it's easier to just create a PR on my behalf and put the scaffolding in, that's fine: I just want to be sure I'm putting the right code in the right place 😄

cccs-rs commented 1 month ago

Hey @jeFF0Falltrades,

That's a good question! Actually Assemblyline has to ability to pull in remote data from external sources so since you're project is public, I would just amend the service_manifest.yaml to include the following lines:

...
updates_config:
  sources:
    ...
    - name: RKP
      pattern: .*/RKP/$
      uri: https://github.com/jeFF0Falltrades/rat_king_parser.git
      default_classification: TLP:C

And as well provide the appropriate contribution/credit in our project's README.

One thing the configextractor-py tool depended on for resolving missing package conflicts (especially in a containerized environment) was the presence of a requirements.txt and then it would create a venv for the particular extractor it detected (this kind of allows us to support using extractors that may depend on different versions of a dependency without raising conflicts).

Considering the pyproject.toml looks like it contains that information under dependencies, I may make an update to configextractor-py to also look for the presence of those files (since I believe this is common practice now) and parse them to retrieve the package dependencies required for creating a venv.

To circle back to your original question of where rkp_maco.py would live: it can actually live in your project, and we can just pull it remotely and integrate it into the system automatically 🤓

If you'd like me to do some testing with you to make sure the integration is working as it should, you can create a branch on your repo with rkp_maco and I can try testing with our staging system.

Thanks again for your patience and cooperation on this! 😁

cccs-rs commented 3 weeks ago

With the latest patch, I made to configextractor-py, it should be able to parse your pyproject.toml and create the venv for your extractor(s).

Now all that's left is to add the MACO variant (rkp_maco.py) in your project and we should be good to go!

jeFF0Falltrades commented 3 weeks ago

With the latest patch, I made to configextractor-py, it should be able to parse your pyproject.toml and create the venv for your extractor(s).

Now all that's left is to add the MACO variant (rkp_maco.py) in your project and we should be good to go!

Are you all tracking me, or do we have some telepathy going on?!!

I literally just decided today to finally finish this out after I got some more adjustments for CAPE done (sorry for the delays), and made a commit to RKP today to facilitate it.

I even started looking at configextractor-py and started looking at how to implement pyproject.toml, but to save time, I just made a requirements.txt in the same dir as my rkp-maco.py

My questions is this: With configextractor-py, I was running into issues because my pyproject.toml was two directories up from rkp_maco.py, but if I set the parser dir to that root directory, it would not recurse down and find my rkp_maco.py file - Do you know if that will work now with your changes?

I.e. the way I have it laid out for now is:

rat_king_parser
|_pyproject.toml
|_src
|__extern
|___maco
|_____rkp_maco.py
|_____requirements.txt
|__rat_king_parser

As you can see - it's working with the current version (prior to your pyproject.toml changes) only because I moved the requirements.txt to the same dir as the extractor, and I set the parser dir to rat_king_parser/src/extern/maco.

Does this directory structure work, or would you recommend some other place for rkp_maco.py to live in the RKP directory tree?

Also, I am putting a bow on rkp_maco.py as we speak, but it's looking good so far!

[
  {
    "MACO": [
      {
        "id": "tmpon0c7gbw.rkp_maco.RKP",
        "yara_hits": [
          "quasarrat"
        ],
        "author": "jeFF0Falltrades",
        "description": "A MACO derivative of jeFF0Falltrades' RAT King Parser",
        "config": {
          "family": "quasarrat",
          "version": "1.0.00.r6",
          "category": [
            "rat"
          ],
          "mutex": [
            "e4d6a6ec-320d-48ee-b6b2-fa24f03760d4"
          ],
          "http": [
            {
              "uri": "https://argentina-e4162-default-rtdb.firebaseio.com/user.json",
              "protocol": "https",
              "usage": "c2",
              "hostname": "argentina-e4162-default-rtdb.firebaseio.com",
              "path": "/user.json"
            },
            {
              "protocol": "https",
              "hostname": "driftcar.giize.com",
              "port": 443,
              "usage": "c2"
            },
            {
              "protocol": "https",
              "hostname": "adreniz.kozow.com",
              "port": 443,
              "usage": "c2"
            }
          ]
        }
      }
    ]
  }
]

cccs-rs commented 3 weeks ago

Are you all tracking me, or do we have some telepathy going on?!!

Smart bet is on telepathy 🧠😅

My questions is this: With configextractor-py, I was running into issues because my pyproject.toml was two directories up from rkp_maco.py, but if I set the parser dir to that root directory, it would not recurse down and find my rkp_maco.py file - Do you know if that will work now with your changes?

I noticed this as well and since your extractors are located in the subdirectory src, which I think it a common enough practice in most projects, I've added a condition to add that to the PATH when the library goes extractor hunting via module traversal 😁

So now wherever you decide to put rkp_maco.py in src, I should be able to find it (or at least I should be able to!) 🤓

jeFF0Falltrades commented 3 weeks ago

Outstanding! Whatever kind of wavelength we're on today, I love it - I'll pull down your latest changes, test with those, then test with AssemblyLine, and if all looks well, I'll make a PR this weekend.

Thanks so much for your effort and patience!

jeFF0Falltrades commented 3 weeks ago

I have pushed the changes for the MACO extractor, but reopening this Issue as I have not been able to test locally with AssemblyLine - I keep running into out-of-memory issues on my local box no matter how I adjust the config.

Does this look like the correct adjustment to be made to service-manifest.yml?

...
updates_config:
  sources:
    ...
    - name: RAT King Parser
      pattern: .*/$
      uri: https://github.com/jeFF0Falltrades/rat_king_parser.git
      default_classification: TLP:C

If you are able to test that @cccs-rs, I'm happy to put that into a PR on AssemblyLine (or feel free to do the same).

Thank you!

cccs-rs commented 3 weeks ago

Yeah I noticed something odd when running inside the container environment that I couldn't replicate on my dev host.

I'll see if I can narrow down what's causing the issue and report back! 🫡

jeFF0Falltrades commented 3 weeks ago

Thanks so much for your effort, as always!

cccs-rs commented 3 weeks ago

Believe I figured it out, it comes down to the library getting confused and loading the wrong directory as a module. Within Assemblyline (and the library), it uses the directory you target and tries to load it as a Python module when it goes extractor hunting.

Since the root of your repo isn't setup as a module, it would've failed at loading anything because the module code is technically under src and the module's name would be different than the directory of the git clone (ie. RAT King Parser vs rat_king_parser).

The patch I'm working on should work for using the library on the host and within Assemblyline (but I'm going to test this first!)

So in this case, let's say if the clone is RKP (because the directory name of the clone will match the source name in AL), it should still be able to detect the extractors within the rat_king_parser module.

Will provide an update once I get this tested in AL! 🤓

jeFF0Falltrades commented 3 weeks ago

@cccs-rs (I believe primarily Ryan, correct me if I'm wrong 😁) - thanks so much for inviting me to contribute to this, and for all the time and effort you spent getting it across the finish line.

I apologize again for it taking so long to focus in on this, but I'm ultimately happy it happened after we got RKP to v3.0.0.

Thanks for AssemblyLine and all you contribute back to the community!

cccs-rs commented 3 weeks ago

Think I got everything all worked out now, just going to test on our staging system for good measure 😅

In regards to the contribution/accreditation, here's the PR I have staged: https://github.com/CybercentreCanada/assemblyline-service-configextractor/pull/237

Don't know what's the best practice in this case, so if you have any pointers, I'd love to hear them 😁

jeFF0Falltrades commented 3 weeks ago

Think I got everything all worked out now, just going to test on our staging system for good measure 😅

In regards to the contribution/accreditation, here's the PR I have staged:

https://github.com/CybercentreCanada/assemblyline-service-configextractor/pull/237

Don't know what's the best practice in this case, so if you have any pointers, I'd love to hear them 😁

No pointers here - that's one of the best formatted attribution sections I've seen 😄.

Thanks so much, again!

cccs-rs commented 3 weeks ago

No pointers here - that's one of the best formatted attribution sections I've seen 😄.

Great to hear! 😋

Thanks again for all the work you've done to help contribute to our project! Glad we finally made it to the finish line 😂

If you happen to be using Assemblyline and run into any problems, feel free to reach out 😁

jeFF0Falltrades / rat_king_parser

Hello from CCCS! 🍁 #7