Open dkoslicki opened 1 year ago
I.e. I was originally under the assumption that the md5 is formed using the entire signature (including the fields hash_function
, name
, etc.), but given the above, it may only use the Looks like it just uses the signatures
fieldmins
, so two signatures that have the same mins but different abundances would also get the same md5
interesting - hadn't thought about the abundance situation!
but, yes, the idea I think I had when I was designing md5sum was that it would be a hash of the content only, not the "metadata".
note that md5sum is used extensively in picklists to select signatures.
I'll provide more perspective and link out to other relevant issues in a bit :).
I ran into an interesting situation:
These two entries have the same exact md5, and yet the files are different. Indeed, at this scale factor and k-mer size, these underlying genomes have Jaccard/containment == 1. Yet looking into it, the files do appear to be different.