nicholas-leonard / dp

A deep learning library for streamlining research and development using the Torch7 distribution.
Other
343 stars 140 forks source link

Need smarter way to manage experiments #112

Open nicholas-leonard opened 9 years ago

nicholas-leonard commented 9 years ago

I am thinking we should launch multiple experiments in parallel from a single controller interface. The controller should allow the user to configure and launch experiments. Then the controller could be used to monitor and compare the different experiments.

For visualization we could use either :

What I like about iTorch is its potential use of notebooks for writing, viewing and interacting with tutorials and experimental reports.

The controller interface could also potentially be made out of an iTorch notebook. Which would allow users to more easily query data for analysis. This could be done with a simple Lua API. The user can call functions to render diagrams, tables, lists, structures, etc.

To minimize the impact of this change, we could provide functions accepting arguments specifying scripts and command-line arguments to run on available resources. We could use the parallel package (which uses ssh) to execute the commands on different servers. All experiments share a common storage space which is managed by the controller, but experimental data remains partitioned. Data for each experiment is saved in a directory on the file system such that it can be accessed by the controller. This can be accomplished using something simple like rsync, or can be handled by something more complicated like a server running on the different machines. Or all experiments could listen for incoming requests. For faster response, we could use threads-ffi to have a pool of aync fibers waiting for requests from the controller.

I tried to implement something like this through the now closed async PR. But it was too complicated and imploded on me. Maybe now we have the tools to make this happen. But I fear I do not have the time.

Kaixhin commented 9 years ago

I've come across the same thought several times, and even for an environment where only one experiment can be run at a time it would make exploration more systematic.

The design patterns within dp, along with the libraries you mentioned, sound like a good fit. After looking around for similar solutions, I came across some notes on Google's Sibyl system, which may be useful.

I'm going to give something a shot, but I'm going to base the web server on Node.js since that's where my experience lies. This still allows computation nodes and clients in Lua. Any basic functionality requests worth considering in a first iteration?