hfawaz / aaltd18

Data augmentation using synthetic data for time series classification with deep residual networks
GNU General Public License v3.0
183 stars 42 forks source link

Parallel implementation? #4

Closed petteriTeikari closed 5 years ago

petteriTeikari commented 5 years ago

Hi again,

If I understood your algorithm correctly, the nb_prototypes_per_class could be made parallel, right?

image

Have you explored this with any of the options there for example: https://stackoverflow.com/questions/9786102/how-do-i-parallelize-a-simple-python-loop or https://pypi.org/project/pp/

As for example I have ~2,000 time series in my own dataset (1981 samples in each of them), and the computational time comes quite unreasonable even with 160 time series subset:

913 seconds (15 minutes) for 40 time series 14,449 seconds (4 hours) for 160 time series

hfawaz commented 5 years ago

Hello,

Thank you for sharing these insights. Indeed the computation of synthetic samples can be parallelized, however I did not explore this idea for this project. If you ever implement it, feel free to create a branch, push and pull requests and I will try to incorporate to the master branch.

I understand that DBA computations can be very expensive ... which is why parallelization would be very beneficial.

Thanks again!

petteriTeikari commented 5 years ago

I will send you the implementation later on when I get to it @hfawaz although not sure of the PR per se as I wanted to test it a bit differently from your pipe. For example, due to the augmentation complexity, I only want to augment once, and test different architecture tweaks with different augmentations