wannesm / dtaidistance

Time series distances: Dynamic Time Warping (fast DTW implementation in C)
Other
1.08k stars 184 forks source link

subsequence_search_fast? #176

Closed tommedema closed 2 years ago

tommedema commented 2 years ago

I recently switched from a for loop with dtw.distance_fast to using subsequence_search.

Before:

    for name, test in groupedTests:
        d = dtw.distance_fast(q, test, use_pruning = True)

        # add to results numpy array

After:

    sa = subsequence_search(q, trainingSeries) # trainingSeries is an array of arrays of series

    best = sa.kbest_matches(k = 100)

    results = np.array(list(map(lambda x: [sa.distances[x.idx], 0, 0, 0], best)))

While this improved speed quite a bit, that was mostly because I added the limit of 100 (where before it was storing the distances of all entries). I looked at the source code for subsequence_search, and noticed that it is not using the C version of dtw.distance. Is it possible to have it run the C version, i.e. is there a subsequence_search_fast?

Currently using lprun I can see that 99% of the time is spent on the sa.kbest_matches invocation.

wannesm commented 2 years ago

A specific method would be easy yes. For now, you can use the faster c version by using:

sa = subsequence_search(q, trainingSeries, dists_options={'use_c': True})
wannesm commented 2 years ago

In the master branch there is now also a kbest_matches_fast function (functionality is identical to passing dist_options). Will be part of the next release.

tommedema commented 2 years ago

@wannesm that dists_options flag would be perfect (I had tried that actually) but I get this TypeError when I do so (running dtaidistance v2.3.9):

sa = subsequence_search(q, trainingSeries, dists_options={'use_c': True})

TypeError: subsequence_search() got an unexpected keyword argument 'dists_options'

TypeError                                 Traceback (most recent call last)
/var/folders/lm/xhqw06kd341ck445l0rpz_cr0000gn/T/ipykernel_73353/797742654.py in <module>
     33 query = queries[0:trainingQueryWindow]['rl0'].groupby([sep['ticker'], sep['dl0']]).head(trainingQueryWindow)[::-1].to_numpy(dtype = float)
     34 
---> 35 get_ipython().run_line_magic('lprun', '-f getQueryParameters getQueryParameters(query)')
     36 # getQueryParameters(query)

~/opt/anaconda3/lib/python3.9/site-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line, _stack_depth)
   2349                 kwargs['local_ns'] = self.get_local_scope(stack_depth)
   2350             with self.builtin_trap:
-> 2351                 result = fn(*args, **kwargs)
   2352             return result
   2353 

~/opt/anaconda3/lib/python3.9/site-packages/decorator.py in fun(*args, **kw)
    230             if not kwsyntax:
    231                 args, kw = fix(args, kw, sig)
--> 232             return caller(func, *(extras + args), **kw)
    233     fun.__name__ = func.__name__
    234     fun.__doc__ = func.__doc__

~/opt/anaconda3/lib/python3.9/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    185     # but it's overkill for just that one bit of state.
    186     def magic_deco(arg):
--> 187         call = lambda f, *a, **k: f(*a, **k)
    188 
    189         if callable(arg):

~/opt/anaconda3/lib/python3.9/site-packages/line_profiler/ipython_extension.py in lprun(self, parameter_s)
    102         try:
    103             try:
--> 104                 profile.runctx(arg_str, global_ns, local_ns)
    105                 message = ""
    106             except SystemExit:

~/opt/anaconda3/lib/python3.9/site-packages/line_profiler/line_profiler.py in runctx(self, cmd, globals, locals)
    140         self.enable_by_count()
    141         try:
--> 142             exec(cmd, globals, locals)
    143         finally:
    144             self.disable_by_count()

<string> in <module>

/var/folders/lm/xhqw06kd341ck445l0rpz_cr0000gn/T/ipykernel_73353/797742654.py in getQueryParameters(q)
      1 def getQueryParameters(q):
----> 2     sa = subsequence_search(q, trainingSeries, dists_options={'use_c': True})
      3 
      4     best = sa.kbest_matches(k = trainingMaxMatchCount)
      5 

TypeError: subsequence_search() got an unexpected keyword argument 'dists_options'

Really appreciate your update to master branch.

It seems like somehow my version installed through pip (2.3.9) does not include the dist_options, unlike what is currently on master branch:

Screen Shot 2022-08-18 at 1 21 25 PM
tommedema commented 2 years ago

Fixed by reinstalling from git: pip install -vvv --upgrade --force-reinstall --no-deps --no-build-isolation --no-binary dtaidistance git+https://github.com/wannesm/dtaidistance.git#egg=dtaidistance