nipype / pydra

Pydra Dataflow Engine
119 stars 57 forks source link

Centralize object hashing and provide a mechanism for types to register a hash #626

Open effigies opened 1 year ago

effigies commented 1 year ago

Right now we have hashing split up in a few places:

An alternative approach could be to use functools.singledispatch:

def hash_obj(obj: object) -> bytes:
    # Works for generic objects with __dict__
    dict_rep = ":".join(":".join(key, hash_obj(val)) for key, val in obj.__dict__.items())
    return sha256(f"{obj.__class__}:{dict_rep}".encode()).hexdigest() 

This defines a cryptographic hash for a generic object that applies recursively. We would need some bottom types that don't have __dict__:

def _(obj: int) -> bytes: ...

def _(obj: str) -> bytes: ...

def _(obj: dict) -> bytes: ...

And each type would be able to declare how much is needed to uniquely identify it across instances. We could add set() and frozenset() to ensure that these known-problematic builtin types are consistent. And then provide a means for a downstream tool to register a type with our hasher, such as:

def _(obj: MyType) -> bytes:


pydra.utils.register_hash(MyType, myhashfun)
effigies commented 1 year ago

Btw I found a stackoverflow for this issue exactly:

tclose commented 1 year ago

This looks good to me