purseclab / ATLAS

ATLAS: A Sequence-based Learning Approach for Attack Investigation
Apache License 2.0
140 stars 55 forks source link

Why go through "all malicous labels + one possible subject"? #10

Closed lesleychou closed 2 years ago

lesleychou commented 2 years ago

Hi Alsaheel,

Sorry for bothering again. May I ask why do you need to append every subject to original malicious labels in this for loop?

This step was taking a lot of time and computation. I wonder would only checking original malicious labels work here? Or is there something I missed in your "result_list" logic?

Thanks!

cssaheel commented 2 years ago

Hi Lesley,

in suggest_ground_truth() I aim at extracting all unique abstracted (i.e., tokenized) sequences, malicious or not. so basically result_list returns with the unique malicious & non-malicious sequences, that's why it is not sufficient to only extract sequences from the malicious entities. As you can see in the Lines 394-399 Before we assign a sequence as a malicious, we check if the tokenized sequence matches a malicious sequence, if not then we assign it as a non-malicious "0", otherwise, we assign it as a malicious sequence "1". Thanks.