malicialab / avclass

AVClass malware labeling tool
MIT License
464 stars 115 forks source link

MISP taxonomies #29

Closed imidoriya closed 1 year ago

imidoriya commented 3 years ago

I'm looking to use avclass2 to classify malware in my MISP instance. After getting a result from avclass, I'd like to give it an appropriate tag. Just wondering if you had a recommendation for tag taxonomies that would best align with the return of avclass.

Here are the MISP taxonomies

The ones that look like they might fit.

ms-caro-malware-full

Malware Type and Platform classification based on Microsoft's implementation of the Computer Antivirus Research Organization (CARO) Naming Scheme and Malware Terminology. Based on https://www.microsoft.com/en-us/security/portal/mmpc/shared/malwarenaming.aspx, https://www.microsoft.com/security/portal/mmpc/shared/glossary.aspx, https://www.microsoft.com/security/portal/mmpc/shared/objectivecriteria.aspx, and http://www.caro.org/definitions/index.html. Malware families are extracted from Microsoft SIRs since 2008 based on https://www.microsoft.com/security/sir/archive/default.aspx and https://www.microsoft.com/en-us/security/portal/threat/threats.aspx. Note that SIRs do NOT include all Microsoft malware families.

mwdb

Malware Database (mwdb) Taxonomy - Tags used across the platform

malware_classification

Classification based on different categories. Based on https://www.sans.org/reading-room/whitepapers/incident/malware-101-viruses-32848

malicialab commented 3 years ago

Hi there, We are planning to release a MISP taxonomy and/or galaxy soon. In the meantime these are what we believe are the most related existing taxonomies:

CLASS --> ms-caro-malware:malware-type, malware-category:malware-category FILE --> ms-caro-malware:malware-platform FAM --> ms-caro-malware:malware-family, mwdb:family BEH --> maec-malware-behavior:maec-malware-behavior

If this addresses your question, please close the issue

imidoriya commented 3 years ago

Excellent - Thanks What do you expect the timeframe to be on the MISP taxonomy release? Might make sense for me to just wait, rather than trying to convert them later.

imidoriya commented 3 years ago

Wanted to follow up on my question above about the timeframe. Thanks

malicialab commented 3 years ago

Hard to say as usual, but we hope to have a first version in three weeks

malicialab commented 3 years ago

I am reopening this issue as we have just committed a first version of the AVClass2 MISP galaxy in the repo: avclass2/data/misp Pretty close to the three weeks timeframe :) It is work in progress. Please take a look and provide feedback / suggestions on what would make it more useful Once we have a stable version we could commit it to the misp-project repository

imidoriya commented 3 years ago

Thanks, I'll take a look.

imidoriya commented 3 years ago

Didn't have any issues. I'm still learning av tagging, so don't have feedback yet accept to say that it looks good.

imidoriya commented 3 years ago

I'm seeing some that I may need to add to the list or synonyms. Not sure how to do that.. but I see tags like

imidoriya commented 3 years ago

I also have tags, misp-galaxy:avclass="bladabi" for example, that are not in the galaxy. In that case, it's a synonym, but others may be new. I'll need to understand how I can improve your galaxy based on my data. Thx

imidoriya commented 3 years ago

Here is an example of an update I tried. https://github.com/imidoriya/avclass/commit/18115f9db8ef33fd739e160aed348476ab8527b7 Is this all I need to do or is there more in avclass that I should be updating? One note for example is Crysis is ransomware, but wasn't sure if I am to specify that anywhere, such as doing related ... type=variant-of. Is it ok for a Family to be a variant-of a Behavior or Class? I'm still parsing through some of my stuff, but I currently have 346 tags that are not listed in the galaxy cluster.

Update As I understand the structure more, I'm updating the taxonomy, alias, and the misp, along with the addition of related entities. https://github.com/malicialab/avclass/compare/master...imidoriya:master

malicialab commented 3 years ago

@jeffg2k that question likely belongs in a separate thread, here is a quick response and if you want to further discuss just open another issue. AvClass2 internally uses a threshold that it only shows tags that appear in at least two AV engines, but where there are some groups of AV engines known to collaborate/copy their labels. In addition tags have their associated counter that captures the number of AV labels (after removing duplicate engines) that include the tag's concept in their AV label. Thus, applying a threshold as you mention should be trivial, although one could also store tag and threshold enabling the application of a threshold later one. I guess the question here is what kind of false indicators you may be seeing. As said, feel free to open another issue if you want to follow up.

malicialab commented 3 years ago

@imidoriya, We typically differentiate between tags in the taxonomy and tokens (i.e., tags not in the taxonomy marked with UNK category if you use the -p option). Both can appear in the AVClass2 output. Tags are included in the MISP taxonomy, but tokens are not because they still have not been classified. If you want us to include some new alias you can open a new issue. For those that you mention I see "hpbladabi" which likely is an alias for "bladabindi" (0.83 rate and we use 0.94 as threshold by default), but I do not see "bladabi" in our data. For "crysis" and "crusis" I see a lower ratio of 0.63 that is why we have not added a tagging rule yet. Both get tagged as ransomware so they could be aliases indeed although I would wait a bit more to add that one.

malicialab commented 3 years ago

@imidoriya , I checked your updates and the only one that convinces me (based on our data) are these aliases: crusis -> crysis dharma -> crysis I just added them to the taxonomy and tagging BTW, if you can share your .alias file (generated with the -aliasdetect option and which only contains cumulative counts of tokens) it makes it much easier for us to check for new aliases.

malicialab commented 3 years ago

I am not familiar with MISP object relationships such as variant-of. We'll take a look and if someone else has feedback on how they use them, please speak up.

imidoriya commented 3 years ago

Gotcha, well I'm currently going through the UNK where I have hundreds of items tagged and trying to put them into the taxonomy. I'll create a PR and I guess you can let me know if they're ok or not, hopefully so as it takes a bit to research them and make the entries. lol I'm trying to also include ref links. I probably should be using your updater, but I'm just not sure how to use it against my data, which is already in MISP.

imidoriya commented 3 years ago

Gives you things like this... in this case, I actually need to fix it as it would probably be better stated as "subtechnique-of" for the remoteadmin. Screen Shot 2021-02-24 at 3 56 44 PM

malicialab commented 3 years ago

Capturing parent-child relationships in the AVClass taxonomy int the MISP cluster file makes a lot of sense. Between "variant-of" and "subtechnique-of" I kind of like "variant-of" better though at least for CLASS entries.

imidoriya commented 3 years ago

For the alias, in the cases below, I was just using the alias defined via https://malpedia.caad.fkie.fraunhofer.de/. In the other cases, I did see the alias in my data. With regard to alias file, what would be the best way to create that as I have a lot of files and each one is scanned independently?

phobos dharma
arena dharma
wadhrama dharma
ncov dharma

geodo emotet
heodo emotet
imidoriya commented 3 years ago

These I identified the alias tags in my data and looked them up on malpedia, which led to the main classification. Also, it was easy to see if bladabi or njrat was tagged, most of the time bladabindi was also tagged. So it was easy to see they were related and alternative names for the same thing, confirmed by malpedia.

bladabi bladabindi
njrat bladabindi

wacatac deathransom

fuerboos goodor
malicialab commented 3 years ago

If you have many files for individual samples, you can put them in the same directory and use the -vtdir option to read all files in the directory and also add the -aliasdetect option.

Regarding Malpedia, we are currently analyzing the entries and aliases they have and AVClass not, but it may take still take us some time to finish the analysis.

imidoriya commented 3 years ago

I have several hundred thousand files and they're compressed and password protected. What I'm doing now is pulling an event from MISP, reading the AV results for uploads, and then feeding that to AvClass.

imidoriya commented 3 years ago

But since I have them tagged now, I can run avclass against any particular set of tags. So if we wanted to run all the files that were classified as deathransom or goodor, I can do that.

imidoriya commented 3 years ago

Not sure how best to clean this up, but it seems a bit excessive for MISP tagging. miner, mining, bitcoinminer, bitcoinmining. Might make sense to just list each once in the cluster with an alias. I would list the miner maybe as the BEH and the bitcoinminer as the CLASS and have the class reference the behavior. Screen Shot 2021-02-25 at 1 51 21 PM

imidoriya commented 3 years ago
These are some of the remaining UNK tags that are not in the cluster, which I wasn't sure about. For example, I think indiloadz is a type of adware but not sure if it's an alias for adware, a sub-technique of adware, or a family of adware. The number to the right is my count in descending order. Tag Count
misp-galaxy:avclass="ursu" 521
misp-galaxy:avclass="disfa" 427
misp-galaxy:avclass="crysan" 414
misp-galaxy:avclass="indiloadz" 412
misp-galaxy:avclass="atraps" 398
misp-galaxy:avclass="dodiw" 290
misp-galaxy:avclass="brmon" 273
misp-galaxy:avclass="gorgon" 238
misp-galaxy:avclass="tasker" 201
misp-galaxy:avclass="bulz" 200
misp-galaxy:avclass="generickdz" 188
misp-galaxy:avclass="veil" 175
misp-galaxy:avclass="msilperseus" 167
misp-galaxy:avclass="subti" 159
misp-galaxy:avclass="bplug" 141
misp-galaxy:avclass="llac" 139
misp-galaxy:avclass="rrat" 138
misp-galaxy:avclass="rescoms" 131
misp-galaxy:avclass="agensla" 123
misp-galaxy:avclass="nanobot" 122
misp-galaxy:avclass="hotkeychick" 121
misp-galaxy:avclass="zenpak" 107
misp-galaxy:avclass="gencbl" 105
misp-galaxy:avclass="noancooe" 104
misp-galaxy:avclass="avemaria" 88
misp-galaxy:avclass="agentb" 84
misp-galaxy:avclass="chapak" 78
misp-galaxy:avclass="injuke" 71
misp-galaxy:avclass="emeka" 70
misp-galaxy:avclass="fsysna" 70
misp-galaxy:avclass="cobalt" 69
misp-galaxy:avclass="vebzenpak" 68
misp-galaxy:avclass="johnnie" 62
misp-galaxy:avclass="alien" 59
misp-galaxy:avclass="redcap" 57
misp-galaxy:avclass="solmyr" 57
misp-galaxy:avclass="xaparo" 57
misp-galaxy:avclass="bsymem" 57
misp-galaxy:avclass="genericgba" 56
misp-galaxy:avclass="obfdldr" 54
misp-galaxy:avclass="leivion" 54
misp-galaxy:avclass="downeks" 53
misp-galaxy:avclass="midie" 50
misp-galaxy:avclass="coins" 48
misp-galaxy:avclass="ruco" 43
misp-galaxy:avclass="pavica" 43
misp-galaxy:avclass="liev" 42
misp-galaxy:avclass="noobyprotect" 42
misp-galaxy:avclass="vatet" 42
misp-galaxy:avclass="noon" 40
misp-galaxy:avclass="mansabo" 40
misp-galaxy:avclass="starter" 38
misp-galaxy:avclass="bobik" 37
misp-galaxy:avclass="scrop" 36
misp-galaxy:avclass="packer" 36
misp-galaxy:avclass="boxedapp" 36
misp-galaxy:avclass="zapchast" 36
misp-galaxy:avclass="hesv" 35
misp-galaxy:avclass="dothetuk" 35
misp-galaxy:avclass="cometer" 35
misp-galaxy:avclass="pyxie" 35
misp-galaxy:avclass="lampa" 34
misp-galaxy:avclass="injects" 32
misp-galaxy:avclass="coroxy" 31
misp-galaxy:avclass="predator" 31
misp-galaxy:avclass="jacksbot" 30
imidoriya commented 3 years ago

Just wanted to say that I've been running this galaxy PR https://github.com/malicialab/avclass/pull/34 for the past month and it's working out pretty well. I still have quite a few that could use categorization as seen above, even if I were to add them as UNK. It would be helpful for how they are displayed in MISP as anything that is not defined is presented as a normal tag, even if you have the galaxy prefix. Thoughts?

malicialab commented 1 year ago

We just committed a script misp.py to keep the MISP taxonomy updated. Only difference should be that it does not add the Malpedia URLs for now. The version has been bumped to match that of the tool, so that it is easy to know which version of the avclass taxonomy it comes from.

Since this issue has been around for quite a while, I am going to close it. Feel free to open another issue if you spot anything.