marcoramilli / MalwareTrainingSets

Free Malware Training Datasets for Machine Learning
220 stars 102 forks source link

Data set for testing data #1

Closed vinayakumarr closed 7 years ago

vinayakumarr commented 7 years ago

I would like to use your data as part of my work. But do you any testing data?

vinayakumarr commented 7 years ago

Could you please share the extracted features of training and test files.

marcoramilli commented 7 years ago

Hi vinayakumarr, you could use the whole dataset as you wish (but please cite it). For example you could use "half" of the data to train your algorithm and "half" of the data to test your trained algorithm.

Did I answer to your question ?

vinayakumarr commented 7 years ago

k. Do you have extracted features?

marcoramilli commented 7 years ago

Yes, the files in the repository are full of features. In the specific those are the extracted features with is frequency:

List of current available features with occurrences counter. : 

   'file_access': 138759,
   'sig_infostealer_ftp': 13114,
   'sig_modifies_hostfile': 5,
   'sig_removes_zoneid_ads': 16,
   'sig_disables_uac': 33,
   'sig_static_versioninfo_anomaly': 0,
   'sig_stealth_webhistory': 417,
   'reg_write': 11942,
   'sig_network_cnc_http': 132,
   'api_resolv': 954690,
   'sig_stealth_network': 71,
   'sig_antivm_generic_bios': 6,
   'sig_polymorphic': 705,
   'sig_antivm_generic_disk': 7,
   'sig_antivm_vpc_keys': 0,
   'sig_antivm_xen_keys': 5,
   'sig_creates_largekey': 16,
   'sig_exec_crash': 6,
   'sig_antisandbox_sboxie_libs': 144,
   'sig_mimics_icon': 2,
   'sig_stealth_hidden_extension': 9,
   'sig_modify_proxy': 384,
   'sig_office_security': 20,
   'sig_bypass_firewall': 29,
   'sig_encrypted_ioc': 476,
   'sig_dropper': 671,
   'reg_delete': 2545,
   'sig_critical_process': 3,
   'service_start': 312,
   'net_dns': 486,
   'sig_ransomware_files': 5,
   'sig_virus': 781,
   'file_write': 20218,
   'sig_antisandbox_suspend': 2,
   'sig_sniffer_winpcap': 16,
   'sig_antisandbox_cuckoocrash': 11,
   'file_delete': 5405,
   'sig_antivm_vmware_devices': 1,
   'sig_ransomware_recyclebin': 0,
   'sig_infostealer_keylog': 44,
   'sig_clamav': 1350,
   'sig_packer_vmprotect': 1,
   'sig_antisandbox_productid': 18,
   'sig_persistence_service': 5,
   'sig_antivm_generic_diskreg': 162,
   'sig_recon_checkip': 4,
   'sig_ransomware_extensions': 4,
   'sig_network_bind': 190,
   'sig_antivirus_virustotal': 175975,
   'sig_recon_beacon': 23,
   'sig_deletes_shadow_copies': 24,
   'sig_browser_security': 216,
   'sig_modifies_desktop_wallpaper': 83,
   'sig_network_torgateway': 1,
   'sig_ransomware_file_modifications': 23,
   'sig_antivm_vbox_files': 7,
   'sig_static_pe_anomaly': 2194,
   'sig_copies_self': 591,
   'sig_antianalysis_detectfile': 51,
   'sig_antidbg_devices': 6,
   'file_drop': 6627,
   'sig_driver_load': 72,
   'sig_antimalware_metascan': 1045,
   'sig_modifies_certs': 46,
   'sig_antivm_vpc_files': 0,
   'sig_stealth_file': 1566,
   'sig_mimics_agent': 131,
   'sig_disables_windows_defender': 3,
   'sig_ransomware_message': 10,
   'sig_network_http': 216,
   'sig_injection_runpe': 474,
   'sig_antidbg_windows': 455,
   'sig_antisandbox_sleep': 271,
   'sig_stealth_hiddenreg': 13,
   'sig_disables_browser_warn': 20,
   'sig_antivm_vmware_files': 6,
   'sig_infostealer_mail': 617,
   'sig_ipc_namedpipe': 13,
   'sig_persistence_autorun': 2355,
   'sig_stealth_hide_notifications': 19,
   'service_create': 62,
   'sig_reads_self': 14460,
   'mutex_access': 15017,
   'sig_antiav_detectreg': 4,
   'sig_antivm_vbox_libs': 0,
   'sig_antisandbox_sunbelt_libs': 2,
   'sig_antiav_detectfile': 2,
   'reg_access': 774910,
   'sig_stealth_timeout': 1024,
   'sig_antivm_vbox_keys': 0,
   'sig_persistence_ads': 3,
   'sig_mimics_filetime': 3459,
   'sig_banker_zeus_url': 1,
   'sig_origin_langid': 71,
   'sig_antiemu_wine_reg': 1,
   'sig_process_needed': 137,
   'sig_antisandbox_restart': 24,
   'sig_recon_programs': 5318,
   'str': 1443775,
   'sig_antisandbox_unhook': 1364,
   'sig_antiav_servicestop': 78,
   'sig_injection_createremotethread': 311,
   'pe_imports': 301256,
   'sig_process_interest': 295,
   'sig_bootkit': 25,
   'reg_read': 458477,
   'sig_stealth_window': 1267,
   'sig_downloader_cabby': 50,
   'sig_multiple_useragents': 101,
   'pe_sec_character': 22180,
   'sig_disables_windowsupdate': 0,
   'sig_antivm_generic_system': 6,
   'cmd_exec': 2842,
   'net_con': 406,
   'sig_bcdedit_command': 14,
   'pe_sec_entropy': 22180,
   'pe_sec_name': 22180,
   'sig_creates_nullvalue': 1,
   'sig_packer_entropy': 3603,
   'sig_packer_upx': 1210,
   'sig_disables_system_restore': 6,
   'sig_ransomware_radamant': 0,
   'sig_infostealer_browser': 7,
   'sig_injection_rwx': 3613,
   'sig_deletes_self': 600,
    'file_read': 50632,
   'sig_fraudguard_threat_intel_api': 226,
   'sig_deepfreeze_mutex': 1,
   'sig_modify_uac_prompt': 1,
   'sig_api_spamming': 251,
   'sig_modify_security_center_warnings': 18,
   'sig_antivm_generic_disk_setupapi': 25,
   'sig_pony_behavior': 159,
   'sig_banker_zeus_mutex': 442,
   'net_http': 223,
   'sig_dridex_behavior': 0,
   'sig_internet_dropper': 3,
   'sig_cryptAM': 0,
   'sig_recon_fingerprint': 305,
   'sig_antivm_vmware_keys': 0,
   'sig_infostealer_bitcoin': 207,
   'sig_antiemu_wine_func': 0,
   'sig_rat_spynet': 3,
   'sig_origin_resource_langid': 2255

For more information I suggest to have a look to: http://marcoramilli.blogspot.it/2016/12/malware-training-sets-machine-learning.html

It will explain a little bit the dataset..

vinayakumarr commented 7 years ago

Thank you

On Thu, Dec 29, 2016 at 1:38 PM, Marco notifications@github.com wrote:

Yes, the files in the repository are full of features. In the specific those are the extracted features with is frequency:

List of current available features with occurrences counter. :

'file_access': 138759, 'sig_infostealer_ftp': 13114, 'sig_modifies_hostfile': 5, 'sig_removes_zoneid_ads': 16, 'sig_disables_uac': 33, 'sig_static_versioninfo_anomaly': 0, 'sig_stealth_webhistory': 417, 'reg_write': 11942, 'sig_network_cnc_http': 132, 'api_resolv': 954690, 'sig_stealth_network': 71, 'sig_antivm_generic_bios': 6, 'sig_polymorphic': 705, 'sig_antivm_generic_disk': 7, 'sig_antivm_vpc_keys': 0, 'sig_antivm_xen_keys': 5, 'sig_creates_largekey': 16, 'sig_exec_crash': 6, 'sig_antisandbox_sboxie_libs': 144, 'sig_mimics_icon': 2, 'sig_stealth_hidden_extension': 9, 'sig_modify_proxy': 384, 'sig_office_security': 20, 'sig_bypass_firewall': 29, 'sig_encrypted_ioc': 476, 'sig_dropper': 671, 'reg_delete': 2545, 'sig_critical_process': 3, 'service_start': 312, 'net_dns': 486, 'sig_ransomware_files': 5, 'sig_virus': 781, 'file_write': 20218, 'sig_antisandbox_suspend': 2, 'sig_sniffer_winpcap': 16, 'sig_antisandbox_cuckoocrash': 11, 'file_delete': 5405, 'sig_antivm_vmware_devices': 1, 'sig_ransomware_recyclebin': 0, 'sig_infostealer_keylog': 44, 'sig_clamav': 1350, 'sig_packer_vmprotect': 1, 'sig_antisandbox_productid': 18, 'sig_persistence_service': 5, 'sig_antivm_generic_diskreg': 162, 'sig_recon_checkip': 4, 'sig_ransomware_extensions': 4, 'sig_network_bind': 190, 'sig_antivirus_virustotal': 175975, 'sig_recon_beacon': 23, 'sig_deletes_shadow_copies': 24, 'sig_browser_security': 216, 'sig_modifies_desktop_wallpaper': 83, 'sig_network_torgateway': 1, 'sig_ransomware_file_modifications': 23, 'sig_antivm_vbox_files': 7, 'sig_static_pe_anomaly': 2194, 'sig_copies_self': 591, 'sig_antianalysis_detectfile': 51, 'sig_antidbg_devices': 6, 'file_drop': 6627, 'sig_driver_load': 72, 'sig_antimalware_metascan': 1045, 'sig_modifies_certs': 46, 'sig_antivm_vpc_files': 0, 'sig_stealth_file': 1566, 'sig_mimics_agent': 131, 'sig_disables_windows_defender': 3, 'sig_ransomware_message': 10, 'sig_network_http': 216, 'sig_injection_runpe': 474, 'sig_antidbg_windows': 455, 'sig_antisandbox_sleep': 271, 'sig_stealth_hiddenreg': 13, 'sig_disables_browser_warn': 20, 'sig_antivm_vmware_files': 6, 'sig_infostealer_mail': 617, 'sig_ipc_namedpipe': 13, 'sig_persistence_autorun': 2355, 'sig_stealth_hide_notifications': 19, 'service_create': 62, 'sig_reads_self': 14460, 'mutex_access': 15017, 'sig_antiav_detectreg': 4, 'sig_antivm_vbox_libs': 0, 'sig_antisandbox_sunbelt_libs': 2, 'sig_antiav_detectfile': 2, 'reg_access': 774910, 'sig_stealth_timeout': 1024, 'sig_antivm_vbox_keys': 0, 'sig_persistence_ads': 3, 'sig_mimics_filetime': 3459, 'sig_banker_zeus_url': 1, 'sig_origin_langid': 71, 'sig_antiemu_wine_reg': 1, 'sig_process_needed': 137, 'sig_antisandbox_restart': 24, 'sig_recon_programs': 5318, 'str': 1443775, 'sig_antisandbox_unhook': 1364, 'sig_antiav_servicestop': 78, 'sig_injection_createremotethread': 311, 'pe_imports': 301256, 'sig_process_interest': 295, 'sig_bootkit': 25, 'reg_read': 458477, 'sig_stealth_window': 1267, 'sig_downloader_cabby': 50, 'sig_multiple_useragents': 101, 'pe_sec_character': 22180, 'sig_disables_windowsupdate': 0, 'sig_antivm_generic_system': 6, 'cmd_exec': 2842, 'net_con': 406, 'sig_bcdedit_command': 14, 'pe_sec_entropy': 22180, 'pe_sec_name': 22180, 'sig_creates_nullvalue': 1, 'sig_packer_entropy': 3603, 'sig_packer_upx': 1210, 'sig_disables_system_restore': 6, 'sig_ransomware_radamant': 0, 'sig_infostealer_browser': 7, 'sig_injection_rwx': 3613, 'sig_deletes_self': 600, 'file_read': 50632, 'sig_fraudguard_threat_intel_api': 226, 'sig_deepfreeze_mutex': 1, 'sig_modify_uac_prompt': 1, 'sig_api_spamming': 251, 'sig_modify_security_center_warnings': 18, 'sig_antivm_generic_disk_setupapi': 25, 'sig_pony_behavior': 159, 'sig_banker_zeus_mutex': 442, 'net_http': 223, 'sig_dridex_behavior': 0, 'sig_internet_dropper': 3, 'sig_cryptAM': 0, 'sig_recon_fingerprint': 305, 'sig_antivm_vmware_keys': 0, 'sig_infostealer_bitcoin': 207, 'sig_antiemu_wine_func': 0, 'sig_rat_spynet': 3, 'sig_origin_resource_langid': 2255

For more information I suggest to have a look to: http://marcoramilli.blogspot.it/2016/12/malware-training- sets-machine-learning.html

It will explain a little bit the dataset..

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/marcoramilli/MalwareTrainingSets/issues/1#issuecomment-269595504, or mute the thread https://github.com/notifications/unsubscribe-auth/ARX5-861zdBMR5u9AQ940qfb3Nsti9_Cks5rM2p5gaJpZM4LTpP5 .

vinayakumarr commented 7 years ago

how to convert to number "pe_sec_name": "916e7571 e8997399 b7b66a05 1f9f1073"

vinayakumarr commented 7 years ago

how to convert to number "pe_sec_name": "916e7571 e8997399 b7b66a05 1f9f1073"

On Thu, Dec 29, 2016 at 1:38 PM, Marco notifications@github.com wrote:

Yes, the files in the repository are full of features. In the specific those are the extracted features with is frequency:

List of current available features with occurrences counter. :

'file_access': 138759, 'sig_infostealer_ftp': 13114, 'sig_modifies_hostfile': 5, 'sig_removes_zoneid_ads': 16, 'sig_disables_uac': 33, 'sig_static_versioninfo_anomaly': 0, 'sig_stealth_webhistory': 417, 'reg_write': 11942, 'sig_network_cnc_http': 132, 'api_resolv': 954690, 'sig_stealth_network': 71, 'sig_antivm_generic_bios': 6, 'sig_polymorphic': 705, 'sig_antivm_generic_disk': 7, 'sig_antivm_vpc_keys': 0, 'sig_antivm_xen_keys': 5, 'sig_creates_largekey': 16, 'sig_exec_crash': 6, 'sig_antisandbox_sboxie_libs': 144, 'sig_mimics_icon': 2, 'sig_stealth_hidden_extension': 9, 'sig_modify_proxy': 384, 'sig_office_security': 20, 'sig_bypass_firewall': 29, 'sig_encrypted_ioc': 476, 'sig_dropper': 671, 'reg_delete': 2545, 'sig_critical_process': 3, 'service_start': 312, 'net_dns': 486, 'sig_ransomware_files': 5, 'sig_virus': 781, 'file_write': 20218, 'sig_antisandbox_suspend': 2, 'sig_sniffer_winpcap': 16, 'sig_antisandbox_cuckoocrash': 11, 'file_delete': 5405, 'sig_antivm_vmware_devices': 1, 'sig_ransomware_recyclebin': 0, 'sig_infostealer_keylog': 44, 'sig_clamav': 1350, 'sig_packer_vmprotect': 1, 'sig_antisandbox_productid': 18, 'sig_persistence_service': 5, 'sig_antivm_generic_diskreg': 162, 'sig_recon_checkip': 4, 'sig_ransomware_extensions': 4, 'sig_network_bind': 190, 'sig_antivirus_virustotal': 175975, 'sig_recon_beacon': 23, 'sig_deletes_shadow_copies': 24, 'sig_browser_security': 216, 'sig_modifies_desktop_wallpaper': 83, 'sig_network_torgateway': 1, 'sig_ransomware_file_modifications': 23, 'sig_antivm_vbox_files': 7, 'sig_static_pe_anomaly': 2194, 'sig_copies_self': 591, 'sig_antianalysis_detectfile': 51, 'sig_antidbg_devices': 6, 'file_drop': 6627, 'sig_driver_load': 72, 'sig_antimalware_metascan': 1045, 'sig_modifies_certs': 46, 'sig_antivm_vpc_files': 0, 'sig_stealth_file': 1566, 'sig_mimics_agent': 131, 'sig_disables_windows_defender': 3, 'sig_ransomware_message': 10, 'sig_network_http': 216, 'sig_injection_runpe': 474, 'sig_antidbg_windows': 455, 'sig_antisandbox_sleep': 271, 'sig_stealth_hiddenreg': 13, 'sig_disables_browser_warn': 20, 'sig_antivm_vmware_files': 6, 'sig_infostealer_mail': 617, 'sig_ipc_namedpipe': 13, 'sig_persistence_autorun': 2355, 'sig_stealth_hide_notifications': 19, 'service_create': 62, 'sig_reads_self': 14460, 'mutex_access': 15017, 'sig_antiav_detectreg': 4, 'sig_antivm_vbox_libs': 0, 'sig_antisandbox_sunbelt_libs': 2, 'sig_antiav_detectfile': 2, 'reg_access': 774910, 'sig_stealth_timeout': 1024, 'sig_antivm_vbox_keys': 0, 'sig_persistence_ads': 3, 'sig_mimics_filetime': 3459, 'sig_banker_zeus_url': 1, 'sig_origin_langid': 71, 'sig_antiemu_wine_reg': 1, 'sig_process_needed': 137, 'sig_antisandbox_restart': 24, 'sig_recon_programs': 5318, 'str': 1443775, 'sig_antisandbox_unhook': 1364, 'sig_antiav_servicestop': 78, 'sig_injection_createremotethread': 311, 'pe_imports': 301256, 'sig_process_interest': 295, 'sig_bootkit': 25, 'reg_read': 458477, 'sig_stealth_window': 1267, 'sig_downloader_cabby': 50, 'sig_multiple_useragents': 101, 'pe_sec_character': 22180, 'sig_disables_windowsupdate': 0, 'sig_antivm_generic_system': 6, 'cmd_exec': 2842, 'net_con': 406, 'sig_bcdedit_command': 14, 'pe_sec_entropy': 22180, 'pe_sec_name': 22180, 'sig_creates_nullvalue': 1, 'sig_packer_entropy': 3603, 'sig_packer_upx': 1210, 'sig_disables_system_restore': 6, 'sig_ransomware_radamant': 0, 'sig_infostealer_browser': 7, 'sig_injection_rwx': 3613, 'sig_deletes_self': 600, 'file_read': 50632, 'sig_fraudguard_threat_intel_api': 226, 'sig_deepfreeze_mutex': 1, 'sig_modify_uac_prompt': 1, 'sig_api_spamming': 251, 'sig_modify_security_center_warnings': 18, 'sig_antivm_generic_disk_setupapi': 25, 'sig_pony_behavior': 159, 'sig_banker_zeus_mutex': 442, 'net_http': 223, 'sig_dridex_behavior': 0, 'sig_internet_dropper': 3, 'sig_cryptAM': 0, 'sig_recon_fingerprint': 305, 'sig_antivm_vmware_keys': 0, 'sig_infostealer_bitcoin': 207, 'sig_antiemu_wine_func': 0, 'sig_rat_spynet': 3, 'sig_origin_resource_langid': 2255

For more information I suggest to have a look to: http://marcoramilli.blogspot.it/2016/12/malware-training- sets-machine-learning.html

It will explain a little bit the dataset..

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/marcoramilli/MalwareTrainingSets/issues/1#issuecomment-269595504, or mute the thread https://github.com/notifications/unsubscribe-auth/ARX5-861zdBMR5u9AQ940qfb3Nsti9_Cks5rM2p5gaJpZM4LTpP5 .

marcoramilli commented 7 years ago

Hi, forgive my delay in answering you. So the conversion is based on MIST: http://www.mlsec.org/malheur/docs/mist-tr.pdf

The script I've been using is quite simple. I wont publish it since is not really well engineered (please read the blog post on why it is like that). You might find a piece of it in that image: https://4.bp.blogspot.com/-lM4-nM8r_rw/WE16263hSmI/AAAAAAAANjo/WLzsf33X0K0I3s63rf5DkAXCJPh2cL2lgCEw/s1600/Screen%2BShot%2B2016-12-11%2Bat%2B17.11.43.png

Hope it will help. I am going to close this issue, if you need more information do not hesitate to contact me.

vinayakumarr commented 7 years ago

You have told the give data is belongs to the 4 classes such as

But the train data has so many directories like APT1, Artemis-1201108, Backdoor.MSIL.Tyupkin-1201110 Could you please specifically mention which are all belongs to the the above mention 4 class families.

I have an idea to implment ml classifier for considering only the above mentioned 4 classes. This is a kind request. Please help me

On Sun, Jan 22, 2017 at 10:24 PM, Marco notifications@github.com wrote:

Closed #1 https://github.com/marcoramilli/MalwareTrainingSets/issues/1.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/marcoramilli/MalwareTrainingSets/issues/1#event-932037313, or mute the thread https://github.com/notifications/unsubscribe-auth/ARX5-w_5Uex2f5cVUTVSNH2Vbvydiqtkks5rU4mwgaJpZM4LTpP5 .