mila-iqia / blocks-extras

A collection of extensions to the Blocks framework
MIT License
27 stars 40 forks source link

Make push() calls in a separate thread #30

Closed johnarevalo closed 8 years ago

johnarevalo commented 8 years ago

Plot extension slows down the training time for small datasets. It happens when batch processing is faster than store_objects()/push() actions to the bokeh server. This is the profiling for a softmax training with the Iris dataset. :

Section                                  Time     % of total
------------------------------------------------------------
Before training                          0.00          0.00%
  TrainingDataMonitoring                 0.00          0.00%
  Timing                                 0.00          0.00%
  Plot                                   0.00          0.00%
  FinishAfter                            0.00          0.00%
  Other                                  0.00          0.00%
Initialization                           0.60          0.86%
Training                                69.52         99.14%
  Before epoch                           0.12          0.17%
    TrainingDataMonitoring               0.01          0.01%
    Timing                               0.01          0.01%
    Plot                                 0.09          0.13%
    FinishAfter                          0.00          0.01%
    Other                                0.01          0.01%
  Epoch                                  2.97          4.24%
    Read data                            0.26          0.37%
    Before batch                         0.16          0.23%
      TrainingDataMonitoring             0.04          0.05%
      Timing                             0.03          0.05%
      Plot                               0.03          0.05%
      FinishAfter                        0.02          0.03%
      Other                              0.04          0.06%
    Train                                1.78          2.54%
    After batch                          0.70          1.00%
      TrainingDataMonitoring             0.57          0.82%
      Timing                             0.03          0.05%
      Plot                               0.03          0.05%
      FinishAfter                        0.02          0.03%
      Other                              0.05          0.06%
    Other                                0.07          0.10%
  After epoch                           66.40         94.68%
    TrainingDataMonitoring               0.01          0.01%
    Timing                               0.02          0.03%
    Plot                                66.35         94.61%
    FinishAfter                          0.01          0.01%
    Other                                0.01          0.02%
  Other                                  0.03          0.05%
After training                           0.00          0.00%
  TrainingDataMonitoring                 0.00          0.00%
  Timing                                 0.00          0.00%
  Plot                                   0.00          0.00%
  FinishAfter                            0.00          0.00%
  Other                                  0.00          0.00%

This is the profiling when push() are called from a separate thread:

Section                                  Time     % of total
------------------------------------------------------------
Before training                          0.00          0.01%
  TrainingDataMonitoring                 0.00          0.01%
  Timing                                 0.00          0.00%
  Plot                                   0.00          0.00%
  FinishAfter                            0.00          0.00%
  Other                                  0.00          0.00%
Initialization                           0.58         14.09%
Training                                 3.53         85.89%
  Before epoch                           0.04          0.87%
    TrainingDataMonitoring               0.01          0.18%
    Timing                               0.01          0.17%
    Plot                                 0.01          0.24%
    FinishAfter                          0.00          0.09%
    Other                                0.01          0.19%
  Epoch                                  3.38         82.09%
    Read data                            0.25          6.14%
    Before batch                         0.18          4.27%
      TrainingDataMonitoring             0.04          0.91%
      Timing                             0.03          0.82%
      Plot                               0.05          1.13%
      FinishAfter                        0.02          0.46%
      Other                              0.04          0.96%
    Train                                2.09         50.80%
    After batch                          0.79         19.19%
      TrainingDataMonitoring             0.64         15.64%
      Timing                             0.03          0.85%
      Plot                               0.05          1.14%
      FinishAfter                        0.02          0.47%
      Other                              0.05          1.10%
    Other                                0.07          1.68%
  After epoch                            0.10          2.41%
    TrainingDataMonitoring               0.01          0.17%
    Timing                               0.02          0.49%
    Plot                                 0.06          1.42%
    FinishAfter                          0.01          0.13%
    Other                                0.01          0.20%
  Other                                  0.02          0.51%
After training                           0.00          0.01%
  TrainingDataMonitoring                 0.00          0.00%
  Timing                                 0.00          0.00%
  Plot                                   0.00          0.00%
  FinishAfter                            0.00          0.00%
  Other                                  0.00          0.00%

The time in Training section decreases from 69.52 to 3.53 seconds.

dwf commented 8 years ago

LOL. Well yes this does seem like a big improvement, thanks.

You'll need to import six.moves.queue.PriorityQueue to fix your code for Python 3. The Queue module is deprecated.

That plus the linter fix:

blocks/extras/extensions/plot.py:171:1: E302 expected 2 blank lines, found 1

preferably with a rebase+squash, and I think this is ready to merge.

johnarevalo commented 8 years ago

I've fix them. There is, however, an issue with Checkpoint extension since Thread class is not serializable. I could follow this approach. But in any case, I don't know how useful could be save Plot object at all.

rizar commented 8 years ago

It is definitely useful, because it allows full resumability of the experiments, which has high priority for Blocks.

On 26 October 2015 at 11:15, John Arevalo notifications@github.com wrote:

I've fix them. There is, however, an issue with Checkpoint extension since Thread class is not serializable. I could follow this approach https://github.com/mila-udem/blocks/blob/58994253579d30fb63a2cfa53499a68b1237c3c0/blocks/extensions/__init__.py#L442. But in any case, I don't know how useful could be save Plot object at all.

— Reply to this email directly or view it on GitHub https://github.com/mila-udem/blocks-extras/pull/30#issuecomment-151170064 .

dwf commented 8 years ago

Yes, I would recommend using __getstate__ to delete the thread object from the returned dict or whatever.

A pattern I like for this is to put all the initialization of the thread in a property, which checks for an existing underscore-prefixed attribute and creates it on demand if it doesn't exist. Thus the thread creation logic can go in one place and you don't need to call a function to initialize it, you just use self.thread or whatever wherever you need it.

On Oct 26, 2015 11:15 AM, "John Arevalo" notifications@github.com wrote:

I've fix them. There is, however, an issue with Checkpoint extension since Thread class is not serializable. I could follow this approach. But in any case, I don't know how useful could be save Plot object at all.

— Reply to this email directly or view it on GitHub.

johnarevalo commented 8 years ago

Sorry, I rebase before read your last comment. I will use property pattern and rebase again.

johnarevalo commented 8 years ago

@dwf @rizar rebased.

johnarevalo commented 8 years ago

I've updated the PR. I have a doubt with the expected behavior for a resumed Plot extension. Should it keep adding points to the last plot?. This PR as well as the master branch overwrite the plot in the bokeh server, i.e. reset "iterations" counter to 1. Edited: @dwf, @rizar I don't know if you were notified.

dwf commented 8 years ago

Sorry this has been languishing. I will take a look at it.

dwf commented 8 years ago

LGTM, thanks!