triw0lf / HEARTH

A community-driven repository for threat hunting ideas, methodologies, and research that serves as a central gathering place for hunters to share knowledge, collaborate on techniques, and advance the field of threat hunting.
https://threathuntingcommunity.com/
145 stars 11 forks source link

Similarity Analysis via Clustering and Text-vectorization #13

Closed fetterm4n closed 2 days ago

fetterm4n commented 1 week ago

Hunt Type 🔥

{"Alchemy (Model-Assisted)"=>"Hunts driven by models like anomaly detection or machine learning."}

HEARTH Crafter

fetterm4n

Hunt Idea / Hypothesis

Compare text-based features of artifacts (User agent strings, Malware / Executables, Browser Extensions) by encoding them with a text-vectorizer. Vectorization creates a numerical representation of the text-based feature which can then be clustered, or directly compared via a variety of similarity measures.

MITRE ATT&CK Tactic

Command and Control, Execution

Implementation Notes

Search Tags

T1071.001 #T1203

Value and Impact

This is an important Model-Assisted methodology which can be applied to hunt for multiple types of threats. This hunt is grounded in two examples which showcase clustering vectorized text fields, and application of similarity measures pre- and post-vectorization, like Levenshtein, hamming, and euclidean distance.

Knowledge Base

DrTerdnugget commented 2 days ago

Submission approved! Nice work! Don't forget to make a pull request to add your name to the list to get your official Contributor status on the repo here:

https://github.com/triw0lf/HEARTH/blob/main/Keepers/Contributors.md

DrTerdnugget commented 1 day ago

@fetterm4n Reference M006.md & M007.md as your contribution in that pull request.