lukeburciu / hpviz

MIT License
3 stars 0 forks source link

Implement a locality hashing on logged events such as TSLH #125

Open madeinoz67 opened 3 years ago

madeinoz67 commented 3 years ago

Locality Sensitive Hashing will allow similar events to be discovered.

https://github.com/trendmicro/tlsh/blob/master/TLSH_CTC_final.pdf https://documents.trendmicro.com/assets/wp/wp-locality-sensitive-hash.pdf https://towardsdatascience.com/locality-sensitive-hashing-for-music-search-f2f1940ace23

lukeburciu commented 3 years ago

similarly: https://pypi.org/project/python-tlsh/

lukeburciu commented 3 years ago

further reading has indicated that while this is promising - tlsh really starts to fall under the area of malware detection. i personally think the scope is a bit too wide. Logs might be better served by vector remap transform and vector log to metric to pull out recurring patterns.

That said some reading resulted in the following flow, something like:

vector-sink (socket) -> listening socket -> tslh libs -> fuzzy hash -> send socket vector-ingest (socket) -> (see untested rough example below).

import tlsh
import socket

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(('localhost', 50000))
s.listen(1)
conn, addr = s.accept()
while 1:
    data = conn.recv(1024)
    if not data:
        break
    conn.sendall(data)
conn.close()

h1 = tlsh.hash(data)
# Note, data needs to be bytes - not a string. This is because TLSH is for binary data and binary data can contain a NULL (zero) byte.

h2 = tlsh.hash(similar_data)
score = tlsh.diff(h1, h2)

h3 = tlsh.Tlsh()
    with open('file', 'rb') as f:
        for buf in iter(lambda: f.read(512), b''):
            h3.update(buf)
        h3.final()
    # this assertion is stating that the distance between a TLSH and itself must be zero
    assert h3.diff(h3) == 0
    score = h3.diff(h1)

what i do like about it though is the fuzzy nature of the hashes, if the performance of tlsh isn't computationally expensive compared to something like azure log analytics or similar, i say its worth at least a POC within 8-12 months.

Thoughts?