Closed andriosrobert closed 3 years ago
I doubt that will work in the general case.
Computing cryptographic hash code of general JSON object is super hard; it is a job for a cryptographer. The task is hard because you need to make sure of two things
{"a": "b"}
and {"b": "a"}
produces the same output; that would be something bad for cryptographic hash.{"a": "b"}
and {"a": "b"}
must produce the same output. This is why you cannot rely on a "regular" JSON library like clojure.core.json.. they are not made for that.I believe you have two options. I would vote strongly for the second option.
{:foo 900 :bar "sss" :toto 34.33}
you should hash tuple [900, "sss", 34.33]
knowing the types are [Long, utf-8 String, Double]
. Your hashing library supports all these types.Thanks for the detailed explanation!
Not sure if you checked it, but I completely removed the JSON serialization from the process. I'm sticking to the EDN format using a data structure that ensures the keys will be always ordered.
I have looked into your change but it will not change the underlying issue. You use serialization as canonical representation of data structure; that will result into bugs in the general case.
The bellow example still shows the problem..
(let [m {:foo 300 :toto 600}
mm {:foo m}
t (assoc (assoc {} :toto 600) :foo 300)
tt {:foo t}]
(println mm (map->md5 mm)) ;; {:foo {:foo 300, :toto 600}} d8a50edd5aeb65be51deb87eefc48f93
(println tt (map->md5 tt))) ;; {:foo {:foo 300, :toto 600}} 52a197cb51d65d93a047df1b7fa994aa
Actually, the bellow code worked with the json implementation but not the prn-str.
user=> (map->md5 {:foo '(1)})
"d53b5f3fd7e4d2d7d6ce5795f308ef7e"
user=> (map->md5 {:foo [1]})
"4dafdc7833269033d683da9aeff61ed9"
I still believe you should only hash things data structure you know exactly the format.
Any updates on this? @andriosr @Jumanjii
Hey @didiercrunch - taking the idea of the tuple and trying to make it easier for others to make this check, I decided to merge the fields in a specific order using a String.
Any thoughts?
Merging this as the test cases are covered and we are using a single primitive type (String), which removes the problem of organizing a data structure in different ways when hashing