next step for malware gan

Hello Weiwei, I have read your paper and used your GAN approach with a real AV engine and it does a pretty good job at improving detection rates.

I have a few questions regarding how this approach could be expanded in a more realistic scenario:

The blackbox detector (like any of today’s ML based AV engines) typically uses a mix of continuous and binary features
For example in addition to API calls (binary features) it also consider total number of sections (numeric ordinal feature) or file entropy (numeric continues feature)
Only a subset of features can be manipulated by the attacker without breaking functionality -like in your paper- of the sample

With this in mind:

In the case of only binary features m whereby only a subset of them q can be added (ORed) whereas the other m-q can’t
How would you change the Generator to take this into account? Would you just ignore the q and train it as before on the m subset assuming that they are somehow uncorrelated?
In the case of continuous features (like count of an attribute) where one could add arbitrary values (possibly unbounded) should one follow the approach used by adversarial images?
How would you then combine both the binary and numerical generators? Would you keep them independent and train them at each iteration independently from each other?

Thanks and regards.

yanminglai / Malware-GAN