iosifache / DikeDataset

Dataset with labeled benign and malicious files 🗃️
MIT License
87 stars 14 forks source link

Malware classification issues #3

Open fjycomes opened 2 months ago

fjycomes commented 2 months ago

I have observed the malware.csv file in the labels directory, and I have seen that you have classified the malware into I have seen that you have classified malware into 9 categories and given the associated probabilities, we can assume that the category with the highest probability is the one that it belongs to, but if we go by this idea, it is obvious that the next six categories are not mentioned because their probabilities are too small, is there something wrong with my understanding of the classification probabilities. ![Uploading pic1.png…]()

iosifache commented 2 months ago

Hi @fjycomes,

Could you please re-upload the image? I'll wait for it before responding to your observations.

fjycomes commented 1 month ago

Hi @fjycomes,

Could you please re-upload the image? I'll wait for it before responding to your observations.

pic1

May I ask if these data represent the probability of their respective categories, for example, 0.4285714 in the second row and fifth column represents the probability of changing the file to trojan, if so I get the labels of all the files by this probability and found that backdoor, worm, spyware, rootkit, encrypter, downloader, and so on. The files are almost useless, I don't know if my understanding is wrong?

iosifache commented 1 month ago

@fjycomes, yes, that is the meaning of the probabilities. Also consider that they are normalised for each entry in the dataset. Why do you think the files are irrelevant?