opendxl / opendxl-tie-client-python

McAfee Threat Intelligence Exchange (TIE) client library for use with the OpenDXL Python Client
Apache License 2.0
15 stars 8 forks source link

Insert SHA256 into TIE take more time than others HASH #21

Open jmcgBYCN opened 3 years ago

jmcgBYCN commented 3 years ago

Hello,

I have experienced some weird thing with Set_external_rep (https://opendxl.github.io/opendxl-tie-client-python/pydoc/basicsetexternalreputationexample.html)

When I insert to the TIEServer HASH from MD5 or SHA1 there is "no delay" about 0,1s for each hash. But when i insert the HASH SHA256 then there is 2 seconds delay, have you already experenced this behavior ?

I have alot of SHA256 to add to the databse from our external rep provider, but is taking to much time.

hbazan commented 3 years ago

Hi First of all, the more hashes you can provide to identify a given file, the better. When an endpoint asks TIE for the reputation of a file, it uses all three hashes (SHA1, SHA256, MD5), and TIE creates a database entry using that data. When someone provides less than 3 hashes, TIE still creates the database entry but it is incomplete, until it gets the three hashes altogether and can match them into the same entry. When you import a single hash type, it can go two ways:

Then, to your issue, at which moment do you get a delay? can you edit your python to print timestamps before each step? Do you see this difference in response time when doing getReputation?

jmcgBYCN commented 3 years ago

I understand and I wish I could set reputation this way. However, our external provider does not send files informations to us, so we don't have any information related on the file, we only know it's a bad reputation/malware. This is why currently we add one hash per request.

from __future__ import absolute_import
from __future__ import print_function

import hashlib
import os
import sys
import time

from dxlclient.client import DxlClient
from dxlclient.client_config import DxlClientConfig

from dxltieclient import TieClient
from dxltieclient.constants import HashType, TrustLevel, FileType, FileProvider, ReputationProp

import json
import logging
import base64
from datetime import datetime

# Import common logging and configuration
#sys.path.append(os.path.dirname(os.path.abspath(__file__)) + "/")
from common import *

# Configure local logger
#logging.getLogger().setLevel(logging.ERROR)
#logger = logging.getLogger(__name__)

# Create DXL configuration from file
config = DxlClientConfig.create_dxl_config_from_file(CONFIG_FILE) #DXL Config files with certs

jsonFile = 'hash.mcafee.tie.json' #SourceFile of hashs

def timeIs():
    now = datetime.now()
    dt_string = now.strftime("%d/%m/%Y %H:%M:%S")
    return dt_string

def main():
    # Create the client
    with DxlClient(config) as client:
        # Connect to the fabric
        client.connect()

        # Create the McAfee Threat Intelligence Exchange (TIE) client
        tieclient = TieClient(client)
        with open(jsonFile) as json_file:
            data = json.load(json_file)
            for p in data:
                b64HashValue= ""
                try:
                    #print('SHA1: ' + p['sha1'])
                    b64HashValue=(base64.b64decode(p['sha1']).decode("utf-8"))
                    #print("SHA1 skip")
                    SetReputation(tieclient,'sha1',b64HashValue)
                    continue
                except Exception as e:
                    b64HashValue= ""
                try:
                    #print('SHA256: ' + p['sha256'])
                    b64HashValue=(base64.b64decode(p['sha256']).decode("utf-8"))
                    #print("SHA256 skip")
                    SetReputation(tieclient,'sha256',b64HashValue)
                    continue
                except Exception as e:
                    b64HashValue= ""
                try:
                    #print('MD5: ' + p['md5'])
                    b64HashValue=(base64.b64decode(p['md5']).decode("utf-8"))
                    #print("MD5 skip")
                    SetReputation(tieclient,'md5',b64HashValue)
                    continue
                except Exception as e:
                    print('')
                    b64HashValue = ""

def SetReputation(tie_client,hHash,value):
    # Create the client
    #with DxlClient(config) as client:
        # Connect to the fabric
    #    client.connect()

        # Create the McAfee Threat Intelligence Exchange (TIE) client
        #tie_client = TieClient(client)
        if hHash == 'md5':
            hashes = {HashType.MD5: value}
        if hHash == 'sha1':
            hashes = {HashType.SHA1: value}
        if hHash == 'sha256':
            hashes = {HashType.SHA256: value}   
        #
        # Request reputation for the file
        #
        reputations_dict = tie_client.get_file_reputation(hashes)
        #
        # Check if there's any definitive reputation (different to Not Set [0] and Unknown [50])
        # for any provider except for External Provider (providerId=15)
        #
        has_definitive_reputation = \
            any([rep[ReputationProp.TRUST_LEVEL] != TrustLevel.NOT_SET
                 and rep[ReputationProp.TRUST_LEVEL] != TrustLevel.UNKNOWN
                 and rep[ReputationProp.PROVIDER_ID] != FileProvider.EXTERNAL
                 for rep in reputations_dict.values()])

        if has_definitive_reputation:
            AbortStr = "Abort: There is a reputation from another provider for the file, External Reputation is not necessary."
            print("hash : " + value + " - " + AbortStr + "")
            #logging.info("hash : " + value + " - " + AbortStr + "")
        else:
            #
            # Set the External reputation for a the file "CyberExternal.exe" to Known Malicious
            #
            try:
                tie_client.set_external_file_reputation(
                    TrustLevel.KNOWN_MALICIOUS,
                    hashes,
                    #FileType.PEEXE,
                    filename="CyberExternal.exe",
                    comment="External Reputation from *******IOC")
                print("Event Sent : " + hHash + " : " + value)
                logging.info("Event Sent : " + hHash + " : " + value)
            except ValueError as e:
                print("Error: " + str(e))

if __name__ == '__main__':
    logging.basicConfig(filename='C:\ExternalReputation\externalreputation.log', level=logging.INFO)
    Time = timeIs()
    logging.info('Start at : ' + Time)
    main()
    Time=timeIs()
    logging.info('Finished at : ' + Time)

From timestamp, the slow appear to be in "SetReputation(tieclient,'sha256',b64HashValue)"

hbazan commented 3 years ago

can you paste logs of a run? and also, can you add timestamp logs before and after each call to tie_client?

jmcgBYCN commented 3 years ago

Here logs and timestamp

C:\ExternalReputation>python downloadhash.py
downloading...
download done
7c3f921c6617c9dfa2ce091f6a85afd0950fbae57ca876c97749794c69140880
f9459a328d44b159cb7ac470b0aeb54ce89adf7abca4bd144317880339e35bce
hash not same
run python
2020-12-07 17:00:57,430 root - INFO - Start at : 07/12/2020 17:00:57
2020-12-07 17:00:57,586 dxlclient.client - INFO - Waiting for broker list...
2020-12-07 17:00:57,618 dxlclient.client - INFO - Trying to connect...
2020-12-07 17:00:57,618 dxlclient.client - INFO - Trying to connect to broker {Unique id: {cc8576886-4545-6565-7777-88888888}, Host name: b********************com, IP address: *******7, Port: 8883}...
2020-12-07 17:00:57,618 dxlclient.client - INFO - Connected to broker {c8576886-4545-6565-7777-88888888}
07/12/2020 17:00:57 Get reputation in database
07/12/2020 17:01:04 Get has_definitive_reputation
07/12/2020 17:01:04 Finished has_definitive_reputation
07/12/2020 17:01:04 before tie_client
Event Sent : sha256 : 225e9596de85ca7b1025d6e444f6a01aa6507feef213f4d2e20da9e7d5d8e430
2020-12-07 17:01:04,057 root - INFO - Event Sent : sha256 : 225e9596de85ca7b1025d6e444f6a01aa6507feef213f4d2e20da9e7d5d8e430
07/12/2020 17:01:04 after tie_client
07/12/2020 17:01:04 Get reputation in database
07/12/2020 17:01:05 Get has_definitive_reputation
07/12/2020 17:01:05 Finished has_definitive_reputation
07/12/2020 17:01:05 before tie_client
Event Sent : sha256 : 392f32241cd3448c7a435935f2ff0d2cdc609dda81dd4946b1c977d25134e96e
2020-12-07 17:01:05,979 root - INFO - Event Sent : sha256 : 392f32241cd3448c7a435935f2ff0d2cdc609dda81dd4946b1c977d25134e96e
07/12/2020 17:01:05 after tie_client
07/12/2020 17:01:05 Get reputation in database
07/12/2020 17:01:08 Get has_definitive_reputation
07/12/2020 17:01:08 Finished has_definitive_reputation
07/12/2020 17:01:08 before tie_client
Event Sent : sha256 : 4e39bc95e35323ab586d740725a1c8cbcde01fe453f7c4cac7cced9a26e42cc9
2020-12-07 17:01:08,010 root - INFO - Event Sent : sha256 : 4e39bc95e35323ab586d740725a1c8cbcde01fe453f7c4cac7cced9a26e42cc9
07/12/2020 17:01:08 after tie_client
07/12/2020 17:01:08 Get reputation in database
07/12/2020 17:01:10 Get has_definitive_reputation
07/12/2020 17:01:10 Finished has_definitive_reputation
07/12/2020 17:01:10 before tie_client
Event Sent : sha256 : 8d7be9ed64811ea7986d788a75cbc4ca166702c6ff68c33873270d7c6597f5db
2020-12-07 17:01:10,042 root - INFO - Event Sent : sha256 : 8d7be9ed64811ea7986d788a75cbc4ca166702c6ff68c33873270d7c6597f5db

Look like is this : reputations_dict = tie_client.get_file_reputation(hashes)

Gettings 2s

hbazan commented 3 years ago

can you paste similar logs but using other hash types?

jmcgBYCN commented 3 years ago

Hello,

Here is short version of logs :

MD5 :

09/12/2020 13:55:01 Get reputation in database
09/12/2020 13:55:01 Get has_definitive_reputation
09/12/2020 13:55:01 Finished has_definitive_reputation
09/12/2020 13:55:01 before tie_client
Event Sent : md5 : 1bc8f4df4551c6efbbb1fe9f965dca49
2020-12-09 13:55:01,839 root - INFO - Event Sent : md5 : 1bc8f4df4551c6efbbb1fe9f965dca49
09/12/2020 13:55:01 after tie_client

SHA1: 

09/12/2020 13:55:33 Get reputation in database
09/12/2020 13:55:33 Get has_definitive_reputation
09/12/2020 13:55:33 Finished has_definitive_reputation
09/12/2020 13:55:33 before tie_client
Event Sent : sha1 : a6c18fcbe6b25c370e1305d523b5de662172875b
2020-12-09 13:55:33,655 root - INFO - Event Sent : sha1 : a6c18fcbe6b25c370e1305d523b5de662172875b
09/12/2020 13:55:33 after tie_client

SHA256 :

09/12/2020 14:02:09 Get reputation in database
09/12/2020 14:02:11 Get has_definitive_reputation
09/12/2020 14:02:11 Finished has_definitive_reputation
09/12/2020 14:02:11 before tie_client
Event Sent : sha256 : ccf1dd2cd1f266006b2e70ab613bdd007fc03018c661f575d028443055d743b6
2020-12-09 14:02:11,307 root - INFO - Event Sent : sha256 : ccf1dd2cd1f266006b2e70ab613bdd007fc03018c661f575d028443055d743b6
09/12/2020 14:02:11 after tie_client

Full log version :

all log py.log

hbazan commented 3 years ago

can you take a MER from the TIE Server: mfe_tie_dxl_log_collector.sh and then open it, find the file /logs/tie/db-stats.txt and upload it here?

jmcgBYCN commented 3 years ago

db-stats.txt

Please find stats here

hbazan commented 3 years ago

ok, you have an unusual files count given your agents count. it shows you have 58K agents, and 4.1M files, when usual figures are less than 2M even on environments with 200K agents You might have some unnecessary data on the DB and that can be causing this particular slowing What TIE clients do you have apart from ENS or VSE? Can you go to ePO -> Server Tasks -> TIE Data Management, edit it, and lower the threshold to 10GB? This is not a limit on the DB size but a threshold that will trigger the cleanup of data that reached it retention period, and it is not triggering because your DB is 13GB and the default threshold is 15GB

Also, we can create an index to help you while the database is back to a more stable status. Steps:

sha256_index.txt

jmcgBYCN commented 3 years ago

We are using ENS clients.

How this could cause a performance issue "only" with SHA256 ? I will reduce the size of threshold then.

Will this script will have an impact on the database ? Can I do it in production env ? Should I do the script also on the secondary TIE server ?

hbazan commented 3 years ago

Only ENS? no MWG or anything similar? then the cleanup will delete a lot of old unnecessary files The issue is that having only the sha256 is not that common, usually requests have all three hashes or MD5 plus SHA256, but with an optimized database requests with only sha256 should work fine anyway The script will create an index based on sha256, it will only make the db respond faster on your scenario, it will not add or delete data Secondary will replicate this by itself

jmcgBYCN commented 3 years ago

Sorry, Yes we have ENS, MWG (about 20) and ATD all are using TIE Server.

Thanks for all this information, I will now procced to the script application and I will comeback to you.

hbazan commented 3 years ago

there is an usual problem with MWG rules, where it is configured to query TIE on all files, or with a file type filter using a list called "executables", which includes a lot of types that are not really executables, like python or javascript. Those kind of files will not have a reputation on TIE and it will end up creating DB entries for no use. The TIE rule on MWG should limit to this file types list: image

jmcgBYCN commented 3 years ago

index has been created

bash-4.1# /opt/McAfee/tieserver/postgresql/bin/psql -Umfetie tie -f index_sha256.sql
CREATE FUNCTION
 create_sha256_index
---------------------

(1 row)

DROP FUNCTION
bash-4.1#

Result is amazing


09/12/2020 15:55:56 after tie_client
09/12/2020 15:55:56 Get reputation in database
09/12/2020 15:55:56 Get has_definitive_reputation
09/12/2020 15:55:56 Finished has_definitive_reputation
09/12/2020 15:55:56 before tie_client
Event Sent : sha256 : f7e1a74e08c5718de9edc57facc26dda97ae5b723420a06ef56f1f6f8aa6fb5a
2020-12-09 15:55:56,974 root - INFO - Event Sent : sha256 : f7e1a74e08c5718de9edc57facc26dda97ae5b723420a06ef56f1f6f8aa6fb5a
09/12/2020 15:55:56 after tie_client

We do have a filter on mediatype, only with file that can be 'malware' image image

hbazan commented 3 years ago

TIE and GTI are mostly focused on PE, executable files (.exe, .dll, .sys, .scr, and others). GTI can also have reputations for PDF and APK too. But not for scripts, so python, javascript, MS batch, those will not have a reputation. Remember this reputations identify files by its hashes, a script or a .doc will have a totally different hash by just changing a single character, so the analysis needs to be based on its behavior. You can analyze your dataset on ePO -> Queries and reports -> New -> TIE Schema as source -> PIE Chart -> Select "Type" as the slice column This chart will surely show a lot of files under Unrecognized file type. This means there is no actual metadata for the file, just a reputation request, and it is for sure the only request for this particular file. You can, on the same Server Task for Data Management, stablish different retention periods for different file types, assigning something like 7 days for Unrecognized, to make sure those files don't make your DB grow unnecessarily.

jmcgBYCN commented 3 years ago

image

Since our External Reputation provider does not send informations of the files, those reputation is classed has "Unrecognized".

hbazan commented 3 years ago

There you go. There are several file types that will never be asked by ENS, it will only ask about executables. Picture, Text, Hypertext, those file types could have a lower retention period, something like 15 days. One comment about Data Management: it will only delete files for which TIE didn't get a request for longer than the retention period. And it will not delete files which have an Enterprise Reputation, no matter the retention period.

jmcgBYCN commented 3 years ago

Picture, Text, Hypertext, files is not included anymore it was on for a week on MWG 8 months ago and has been changed.

Since the threshold was 15GB I suspect those files had never been cleaned. Tomorrow the chart will not be the same (I hope).