lmcinnes / pynndescent

A Python nearest neighbor descent for approximate nearest neighbors
BSD 2-Clause "Simplified" License
901 stars 105 forks source link

Serialization #93

Open wmayner opened 4 years ago

wmayner commented 4 years ago

Some objects in pynndescent are not serializable by pickle or even cloudpickle, for example pynndescent.rp_trees.FlatTree. This prevents serialization of UMAP fit objects when pynndescent is used.

Relevant UMAP issue: lmcinnes/umap#273

lmcinnes commented 4 years ago

This seems to be related to how Numba handles things. Hopefully there is a reasonable work-around. It will take some time to figure out the right way to handle this.

lmcinnes commented 4 years ago

Potentially I have a fix here. Hopefully this works... I won't have time to test it for a while as I'm travelling very soon.

wmayner commented 4 years ago

Thanks a lot, I'll give it a try!

adilosa commented 4 years ago

Thanks! This patch seems to work but only for the sparse case. To handle the dense case I changed renumbaify_tree to the following, which seems to fix the issue:

def renumbaify_tree(tree):
    if tree.hyperplanes[0].ndim == 1:
        hyperplanes = numba.typed.List.empty_list(dense_hyperplane_type)
    else:
        hyperplanes = numba.typed.List.empty_list(sparse_hyperplane_type)
    ....

With this change I'm able to serialize a UMAP instance using joblib, and load it back.

jpambrun commented 4 years ago

@adilosa's snippet also got me unstuck.