Closed ReeceHumphreys closed 3 months ago
Note: The idea with this PR is that it will enable me to do something like this in slicec-cs
// Hash the vector of slice files using the sha256 algorithm.
let final_hash = compilation_state
.files
.iter()
.map(|file| file.compute_sha256_hash())
.fold(Sha256::new(), |mut hasher, file_hash| {
hasher.update(file_hash);
hasher
})
.finalize()
.map(|byte| format!("{:02x}", byte))
.into_iter()
.collect::<String>();
The idea with this PR is that it will enable me to do something like this in
All good! Just wanted to say that the array returned by finalize
can also be directly hex-ified though.
Not sure what the exact format is, but try printing the result itself (with {:032x}
) and see if it looks reasonable.
If it does, then the last 3 functions in your chain can probably just removed.
.finalize()
.map(|byte| format!("{:02x}", byte))
.into_iter()
.collect::<String>();
But maybe it will output things in a way we don't like. I don't remember, it's been a while since I've done hashing in Rust. Worth a try though IMO!
Not a typo. Changing your 02
changed to 032
was intentional since a single byte takes up 2 digits,
whereas an SHA256 hash should take up 32 digits.
I think it makes sense to have this in slicec
with a helper function to hash a list of files. In the future we may want to include this "compilation hash" as part of the serialized AST.
It's not obvious to me if we should only include the file contents and not include file name/ path.
Yes, an important question is: "what are we trying to measure here?" Without an answer to that, it's not clear what we should and shouldn't be taking here. I'd assumed I'd missed a discussion on this, but maybe not.
In the future we may want to include this "compilation hash" as part of the serialized AST.
Sure, it can live in slicec
, but I would be surprised if it's doesn't end up being dead code.
Even if we decide to include a hash alongside the serialized AST, it would probably include more information that the hash we're computing for our telemetry stuff.
And moving it would solve the concern of crate versions falling out of sync.
Because I would really prefer not to re-export sha2
from slicec
.
Adding the entire crate to our public API, for this very niche function seems pretty extreme to me.
I moved what was originally going to be in slicec-cs
into a helper function here instead. I implemented it just for slices of SliceFile
. We could do it more generically but its not necessary imo.
In the future we may want to include this "compilation hash" as part of the serialized AST.
Sure, it can live in
slicec
, but I would be surprised if it's doesn't end up being dead code. Even if we decide to include a hash alongside the serialized AST, it would probably include more information that the hash we're computing for our telemetry stuff.
I'm not really following what yo mean here. Once of the things we can include in the response would be a hash of the src + reference files. No need for each language to try and compute (read all the files and figure what what's needed) this itself. We'll likely want to try and exclude unused reference files.
And moving it would solve the concern of crate versions falling out of sync. Because I would really prefer not to re-export
sha2
fromslicec
. Adding the entire crate to our public API, for this very niche function seems pretty extreme to me.
I think it's fine to just include
This PR adds a
compute_sha256_hash
method toSliceFile
. This function is used to generate a hash that will be stable across releases. Given my comment below, should we re-exportsha2
fromslicec
since I want to use it inslicec-cs
? This would guarantee the versionslicec-cs
uses will always be identical to the slicec version.