Open s-sabareeswaran opened 5 years ago
Looking more closely at the repository, it appears that data.npz
has the dataset. It has 441 benign samples and 1,368 malware files. The difference between data.npz
and data1.npz
is not clear to me. README.md
states "I have used 3000 malware samples and 1500 benign samples for trainning and testing(will expand further)." I could not find that set however.
hey zaydH can i contact you through skype or zoom to understand this code , because i m still not clear
Skype will be difficult. If you have questions, I recommend opening issues (one for each question). I can try answering them if I think the question is within my wheelhouse. In the end, the extent that I know about the code in this repository is very limited. I have just tried running it and looking at the debugger. @yanminglai is the expert here -- not me.
Can you tell the name of zip files you downloaded for the dataset? I'm trying to make adversial malware test it on commercially used software but getting the features from cuckoo takes time so I was hoping If you can provide the files and then I will use the features extracted by you.
@rnehra01 -- I am not sure if you are asking me or asking @yanminglai . If you are asking me, I implemented my own version of this network using PyTorch. Details of the dataset I used are described in my project's GitHub repository.
Actually, I'm asking about original malware files from which the API calls have been extracted. I check your repo but it has the same type of data as available here. BTW @ZaydH do you happen to know about a dataset where I can find more features (other than just API calls) available publicly so that I don't have to use Cuckoo to extract them.
@rnehra01 I am unsure what you mean here. I only uploaded @yanminglai 's NumPy arrays to my repo.
However, as I describe in the README.md
, I did not use those files for my experiments. I used the SLEIPNIR dataset. The creators of that dataset requested it not be publicly posted, which I respected. However, you can request access through this online form. Have you checked this and it did not work for you? The SLEIPNIR dataset has about 22,000 features.
Oh.. my bad. I only looked into the data folder and didn't read carefully. I have filled the Google form. Thanks for pointing that.
Please upload your data set