Closed lesleychou closed 2 years ago
Hi Lesley,
in suggest_ground_truth() I aim at extracting all unique abstracted (i.e., tokenized) sequences, malicious or not. so basically result_list returns with the unique malicious & non-malicious sequences, that's why it is not sufficient to only extract sequences from the malicious entities. As you can see in the Lines 394-399 Before we assign a sequence as a malicious, we check if the tokenized sequence matches a malicious sequence, if not then we assign it as a non-malicious "0", otherwise, we assign it as a malicious sequence "1". Thanks.
Hi Alsaheel,
Sorry for bothering again. May I ask why do you need to append every subject to original malicious labels in this for loop?
This step was taking a lot of time and computation. I wonder would only checking original malicious labels work here? Or is there something I missed in your "result_list" logic?
Thanks!