Open madeinoz67 opened 3 years ago
similarly: https://pypi.org/project/python-tlsh/
further reading has indicated that while this is promising - tlsh really starts to fall under the area of malware detection. i personally think the scope is a bit too wide. Logs might be better served by vector remap transform and vector log to metric to pull out recurring patterns.
That said some reading resulted in the following flow, something like:
vector-sink (socket) -> listening socket -> tslh libs -> fuzzy hash -> send socket vector-ingest (socket) -> (see untested rough example below).
import tlsh
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(('localhost', 50000))
s.listen(1)
conn, addr = s.accept()
while 1:
data = conn.recv(1024)
if not data:
break
conn.sendall(data)
conn.close()
h1 = tlsh.hash(data)
# Note, data needs to be bytes - not a string. This is because TLSH is for binary data and binary data can contain a NULL (zero) byte.
h2 = tlsh.hash(similar_data)
score = tlsh.diff(h1, h2)
h3 = tlsh.Tlsh()
with open('file', 'rb') as f:
for buf in iter(lambda: f.read(512), b''):
h3.update(buf)
h3.final()
# this assertion is stating that the distance between a TLSH and itself must be zero
assert h3.diff(h3) == 0
score = h3.diff(h1)
what i do like about it though is the fuzzy nature of the hashes, if the performance of tlsh isn't computationally expensive compared to something like azure log analytics or similar, i say its worth at least a POC within 8-12 months.
Thoughts?
Locality Sensitive Hashing will allow similar events to be discovered.
https://github.com/trendmicro/tlsh/blob/master/TLSH_CTC_final.pdf https://documents.trendmicro.com/assets/wp/wp-locality-sensitive-hash.pdf https://towardsdatascience.com/locality-sensitive-hashing-for-music-search-f2f1940ace23