Open jordansissel opened 9 years ago
This problem still exist to me, and it doesn't seem like https://github.com/logstash-plugins/logstash-filter-fingerprint/commit/7292935638b14b433ba26096f7451a1f5342ca76 has fixed it. I am running v3.2.2 of logstash fingerprint plugin
Sample data:
"fw": { "talkers": [ "222.222.222.222", "111.111.111.111" ] }
"fw": { "talkers": [ "111.111.111.111", "222.222.222.222" ] }
Now I run fingerprint on this value to produce hash
fingerprint {
method => "MURMUR3"
source => "[fw][talkers]"
target => "[fw][talkers_hash]"
concatenate_sources => true
}
And they don't produce the same result.
This also doesn't sort before fingerprint. both source fields are a string with an ipv4 address.
fingerprint {
method => "MURMUR3"
source => [ "[fw][src_ip]", "[fw][dst_ip]" ]
target => "[fw][talkers_hash2]"
concatenate_sources => true
}
For me, the workaround is using ruby filter to sort before fingerprint
it
ruby { code => 'event.set("[fw][talkers]", event.get("[fw][talkers]").sort)' }
Hi @sliddjur , sorting of fields specified in source
field has been removed by 7292935.
Now it is up to end user to specify order of fields that he/she wants to be considered while calculating hash.
(This issue was originally filed by @nicholas-marshall at https://github.com/elastic/logstash/issues/2396)
Good Day,
I am working on creating hash values for the 5-tupes of src_ip, src_port, dest_ip, dest_port, proto and then dest_ip, dest_port, src_ip, src_port, proto in order to use these two fingerprints to build bidirectional flows out of flow data I am collecting. However with the following fingerprint filter:
Fingerprint the communications flow by creating source and destination hashes over the IP and ports of the source and destination. The src_hash will be the src_ip, src_port, dest_ip, dst_port and the dest_hash will be dest_ip, dest_port, src_ip, src_port. Then joining duplex flows becomes possible.
if [src_ip] and [dest_ip] { fingerprint { concatenate_sources => true method => "SHA1" key => "KEYKEYKEY" source => [ "src_ip", "src_port", "dest_ip", "dest_port", "proto" ] target => "src_fingerprint" }
fingerprint { concatenate_sources => true method => "SHA1" key => "KEYKEYKEY" source => [ "dest_ip", "dest_port", "src_ip", "src_port", "proto" ] target => "dest_fingerprint" } }
Both src_fingerprint and dest_fingerprint are the same. I find this to be very confusing as a fingerprint should be unique and a hash of two strings should be different values. Digging into the ruby code of fingerprint.rb on line 63 has @source.sort.each do |k| which sorts the values in source before concatenating them. So by sorting the values of source before hashing them causes collisions and non-unique values.
I fixed it for my use-case by changing @source.sort.each do |k| to @source.each do |k|, however I suggest adding an option in the fingerprint filter to the effect of unsorted_source => true. Removing the sort part of the code at this point would break backwards compatibility as fingerprints would suddenly change.
Sincerely,
Nicholas Marshall