yanminglai / Malware-GAN

Realization of paper: "Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN" 2017
https://arxiv.org/abs/1702.05983
GNU General Public License v3.0
114 stars 59 forks source link

Dataset #2

Open s-sabareeswaran opened 5 years ago

s-sabareeswaran commented 5 years ago

Please upload your data set

ZaydH commented 5 years ago

Looking more closely at the repository, it appears that data.npz has the dataset. It has 441 benign samples and 1,368 malware files. The difference between data.npz and data1.npz is not clear to me. README.md states "I have used 3000 malware samples and 1500 benign samples for trainning and testing(will expand further)." I could not find that set however.

s-sabareeswaran commented 5 years ago

hey zaydH can i contact you through skype or zoom to understand this code , because i m still not clear

ZaydH commented 5 years ago

Skype will be difficult. If you have questions, I recommend opening issues (one for each question). I can try answering them if I think the question is within my wheelhouse. In the end, the extent that I know about the code in this repository is very limited. I have just tried running it and looking at the debugger. @yanminglai is the expert here -- not me.

rnehra01 commented 5 years ago

Can you tell the name of zip files you downloaded for the dataset? I'm trying to make adversial malware test it on commercially used software but getting the features from cuckoo takes time so I was hoping If you can provide the files and then I will use the features extracted by you.

ZaydH commented 5 years ago

@rnehra01 -- I am not sure if you are asking me or asking @yanminglai . If you are asking me, I implemented my own version of this network using PyTorch. Details of the dataset I used are described in my project's GitHub repository.

rnehra01 commented 5 years ago

Actually, I'm asking about original malware files from which the API calls have been extracted. I check your repo but it has the same type of data as available here. BTW @ZaydH do you happen to know about a dataset where I can find more features (other than just API calls) available publicly so that I don't have to use Cuckoo to extract them.

ZaydH commented 5 years ago

@rnehra01 I am unsure what you mean here. I only uploaded @yanminglai 's NumPy arrays to my repo.

However, as I describe in the README.md, I did not use those files for my experiments. I used the SLEIPNIR dataset. The creators of that dataset requested it not be publicly posted, which I respected. However, you can request access through this online form. Have you checked this and it did not work for you? The SLEIPNIR dataset has about 22,000 features.

rnehra01 commented 5 years ago

Oh.. my bad. I only looked into the data folder and didn't read carefully. I have filled the Google form. Thanks for pointing that.