Closed recsater closed 1 year ago
Hi, @recsater,
The script only deals with dumping that raw data into a CSV file from Google Cloud Storage. After achieving the scanning step, you need to create your own labeling strategy or adapt the dike's one.
You can check dike's implementation in the update_malware_labels
function from dataset module. There, the votes and tags are processes to obtain the malice and the families' ownership.
Hi, @recsater, The script only deals with dumping that raw data into a CSV file from Google Cloud Storage. After achieving the scanning step, you need to create your own labeling strategy or adapt the dike's one. You can check dike's implementation in the
update_malware_labels
function from dataset module. There, the votes and tags are processes to obtain the malice and the families' ownership.
First of all, thank you for your reply.
As an additional question, I would like to get exactly the same constant used to make the DikeDataset labels.
Because I'm working on a project to classify malicious code using labels(malware.csv, benign.csv) from DikeDataset.
To do that, can I know the following values?
In Class DataFolderScanner, self._malware_families self._malicious_benign_votes_ratio self._min_ignored_percent
These are defined like
I am sorry for my bad English. thank you.
dike used a YAML configuration file that contains all the configurable aspects of its functioning. You can find out the values you mentioned by checking the dataset
section in the configuration.yaml
file.
And I'm glad to hear that these repositories are useful! Please let me know if you have any other questions, I'm happy to help.
https://github.com/iosifache/dike/blob/main/codebase/scripts/continuous_vt_scan.py
I entered this link, but I didn't know from labeling step 3.
What should I do from labeling step 3?
From
To