[Enhancement -design question] Visualizing parameter updates as training progresses

shark8me commented 7 years ago

Hi all,

Other neural network implementations have tools that aid visualization and debugging of a network while being trained. Specifically, I've found that the Tensorboard UI (from Tensorflow ) is quite capable.

The toolkit should be capable of turning on (or off) instrumentation for different parameters, such as

gradient for weights
raw weight values
different evaluation metrics (e.g. F-measure or accuracy) on train and/or validation sets.

To enable the same kind of functionality in Cortex, there are a few design choices:

Create a built-in visualization tool that can display important metrics
Log event data (to file or other output streams) in a standard format. Downstream consumers can then transform and visualize this in any third party tool.

To me, the second approach is preferred for two reasons:

it decouples the core library from the visualization requirements. (additionally, training on image/video/text datasets will require visualizing actual image or video samples as training progresses)
it can leverage existing toolkits (such as Tensorboard)

I would like to hear your thoughts on this topic

Thanks!

harold commented 7 years ago

Hi, thanks for these super-interesting thoughts.

Visualizations are key, and a hot topic here. Integrating with tensorboard is a neat idea and we'll definitely look into that.

We have a growing number of internal tools for doing the kinds of things you describe here that were using daily for customer projects. Coalescing them into something cogent and useful we can share as open-source is a stated goal of ours.

I appreciate you starting this conversation off, I'll be interested to see if others chime in; I bet we can end up with something pretty awesome.

mikera commented 7 years ago

I personally like the idea of decoupling things with clearly defined interfaces so that alternative visualisation approaches can be plugged in, especially if they have complex dependencies or environment setup requirements.

One general way to think about things would be to see training as a reduction process, such that:

A set of training results is produced as a map after each mini-batch.
The function that produces the training results can be either user defined, or a standard one providing useful defaults such as batch number, calculated mean loss on mini-batch etc.
The results can be consumed either as a (lazy) sequence or piped into core.async channels for dynamic visualisation or serialisation to another process etc.

shark8me commented 7 years ago

@harold Would the code in the experiment folder be a good starting point to instrument a training cycle and stream out important metrics?

I didn't see (or find) a listener function (like test-fn) in the cortex/src itself. Is it likely to get added there?

Thanks!

harold commented 7 years ago

Would the code in the experiment folder be a good starting point to instrument a training cycle and stream out important metrics?

I think so. The basic idea of experiment is layer advanced functionality over, and capture best practices uses of, the core cortex api. train.clj in experiment does this for neural nets in general. The file you linked is specific to neural nets for classification (one specific type of problem they can be used to solve).

I didn't see (or find) a listener function (like test-fn) in the cortex/src itself. Is it likely to get added there?

My guess would be no. The core functionality of training and running will be kept minimal in order to support various scenarios and remain minimal/composable/flexible.

cnuernber commented 7 years ago

@shark8me: Do you feel that your two recent tensorboard pull requests address this issue?

shark8me commented 7 years ago

Yes, it fixes the issue, this can be marked as resolved.

On 16-Jun-2017 5:43 PM, "Chris Nuernberger" notifications@github.com wrote:

@shark8me https://github.com/shark8me: Do you feel that your two recent tensorboard pull requests address this issue?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thinktopic/cortex/issues/165#issuecomment-309044862, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFb9KbNTZTSrPSALthrgSopJZprtJ0_ks5sEpSUgaJpZM4Na0LC .

originrose / cortex

[Enhancement -design question] Visualizing parameter updates as training progresses #165