tinkoff-ai / etna

ETNA – Time-Series Library
https://etna.tinkoff.ru
Apache License 2.0
862 stars 80 forks source link

[BUG] DTWClustering cant be serialized #307

Closed WinstonDovlatov closed 2 years ago

WinstonDovlatov commented 2 years ago

🐛 Bug Report

When i try to serialize DTWClustering object by dill or pickle, i get errors.

for pickle: Can't pickle <function simple_dist at 0x7fde4f5514c0>: it's not the same object as etna.clustering.distances.dtw_distance.simple_dist. It mast be some kind of name error

for dill: cannot pickle 'generator' object

Expected behavior

DTWClustering can be serialized by dill and pickle

How To Reproduce

Code

dtw = DTWClustering() with open('path', 'wb') as fout: dill.dump(dtw, fout)

dtw = DTWClustering() with open('path', 'wb') as fout: pickle.dump(dtw, fout)

Screenshots

Environment

No response

Additional context

No response

Checklist

martins0n commented 2 years ago

Some additional context: We use numba generated cfuncs for DTW distance computation.

Current most common approaches like dill or standard pickle can't serialize cfuncs.

So we can change numba.cfunc to numba.jit(nopython=True) as a straightforward solution. It shouldn't affect perfomance dramatically.