Roadmap - Githubissues

pluskid commented 9 years ago

Discussions and/or suggestions are welcome!

Interface
- Network architecture visualization
- Recurrent Neural Networks
Infrastructure
- [ ] CUDA Stream
- [ ] Multi-GPU support
- [x] 4D tensor -> ND tensor (#21, #25)
- [x] Unsupervised Learning ((deep) autoencoders and variants) (#29)
Document
- Developer's Guide

jfsantos commented 9 years ago

Even though restricted Boltzmann machines (and DBMs/DBNs) and autoencoders (DAE, CAE, stacked autoencoders) have a different principle as they are unsupervised, having an implementation that follows the Mocha architecture could be useful. We started discussing this for DBNs here, as we have a simple implementation for RBMs and DBNs and would like to make it compatible with Mocha.

pluskid commented 9 years ago

@jfsantos Thanks! I think autoencoders, although unsupervised, are still trained with SGD, we just specify the label to be the same as the input data, and then in principle we could already do this in Mocha. And we might need to add some special layers to support variants of autoencoders. But I might be wrong, as I haven't worked on autoencoders at all. Do you know the details?

As for Bayesian networks, yes, I agree they are very different paradigms. And especially we already have a package (dfdx/Boltzmann.jl#3) on that, I think it is better to keep them in two different packages. But definitely making them compatible should be a goal, and maybe some collaboration.

For using DBNs/DBMs to initialize the weights of DNNs, I think this might already be quite easy. If you could export the weights to HDF5 file with compatible naming, then Mocha should be able to load them, just like loading Caffe's exported models, and then start supervised training on that. We could make Mocha's loading interface richer by for example, allow the user to control in fine details which layer should load from which file a dataset with which name, etc. Also we could probably discuss about a common data format that suits both needs.

jfsantos commented 9 years ago

You are right about autoencoders being trained with SGD as MLPs. There are some "special" things, though:

specific regularizers/cost functions (e.g., for contractive and sparse autoencoders)
Tied weights: the case where the decoder's weight matrix is simply the transposed weight matrix of the encoder, so you only update the encoder weight matrix (however, each layer has its own biases).
A "corruption layer" is needed for adding noise/zeroing elements of the input data in the case of denoising autoencoders.
In case we want to support stacked autoencoders, they're a bit of a different animal (more like DBNs, in the sense you have to iteratively train them layer by layer).

I'll work on a draft implementation for initializing a DNN with a DBN from Boltzmann.jl and let you know as soon as I have something (hopefully, by submitting a pull request!).

pluskid commented 9 years ago

@jfsantos Thanks for the details! I see, it is kind of do-able but not trivial. I need to think about this further.

philtomson commented 9 years ago

Just wondering what the ETA for recurrence support might be?

pluskid commented 9 years ago

@philtomson That is definitely a plan/goal, but maybe after the auto-encoders. The reason is that I do not know RNN enough to start implement them right away. But I think many of the building-blocks are already there. Especially if you want to do a simple explicit unfolding of fixed-length history, I think one could already have a model like that by making use of the shared-parameter mechanism in Mocha. For variable-length RNN support, I need to think more, especially about how the interface should be organized.

That being said, suggestions are very welcome from people who already know RNN. For example, what is the simplest, representative and reproducible example for RNN (like MNIST for CNN)? Are there any nice existing library for RNN (whose way of organizing the user interface we should possible learn from)? etc.

zhongwen commented 9 years ago

@pluskid Maybe the followings are helpful: Andrej Karpathy's Neuraltalk: https://github.com/karpathy/neuraltalk Alex Graves's RNNLIB http://sourceforge.net/projects/rnnl/

pluskid commented 9 years ago

@zhongwen Thanks for the links!

the-moliver commented 9 years ago

I'm planning to add time-delay neural networks. I have a working implementation ( https://github.com/the-moliver/NeuralNets.jl ) that I want to port to Mocha.

philtomson commented 9 years ago

It would be nice to have a Caffe file -> Mocha converter. Maybe I'll work on something like that. Should be doable, right? Or are there Caffe features that are not yet in Mocha?

pluskid commented 9 years ago

We already have the ability to load caffe models, but you still need to manually translate the model definition. Automatic translation of architecture is theoretically possible but I guess might by quite tedious to implement. (I'm thinking maybe there should be some universal Dnn architecture specification language coming out recently). Most of the core functionality in caffe has correspondence in mocha. But caffe also have many unofficial forks, which implemented some specific layers, for those, it is more difficult to convert.

nikolaypavlov commented 9 years ago

It would be nice to have maxout layer in addition to dropout..
Max-norm Regularization can help for optimal dropout nets tuning.

pluskid commented 9 years ago

@nikolaypavlov Thanks for the suggestions

Based on my understanding, maxout is simply a max pooling over some units. We can achieve this by using the existing PoolingLayer or ChannelPoolingLayer. Let me know if you are talking about something else.
Max-norm regularization is actually implemented, see for example filter_cons for ConvolutionLayer.

nikolaypavlov commented 9 years ago

@pluskid Great, I'll try to play with PoolingLayer.

outlace commented 9 years ago

Is this project meant to be the Theano/Torch of Julia?

Is there ever going to be OpenCL support?

pluskid commented 9 years ago

@outlace, this is more like torch than theano in that sense. There is no planned Opencl support unless Julia gets better native support for gpu targets.

nstiurca commented 8 years ago

I would be very interested in OpenCL support as well. In fact, I have half a mind to take a stab at it myself. If I can leverage an OpenCL BLAS library (say, CLBLAS.jl), then I basically just have to write im2col.cl and a couple of pooling and neuron kernels, and structure everything else similarly to the CUDA backend.

If I did this, in the interest of clarity would you be OK with renaming GPUBackend -> CUDABackend (adding @deprecated typealias GPUBackend CUDABackend or similar for compatibility), and naming the new backend OCLBackend?

pluskid commented 8 years ago

@nstiurca Thanks! This could be cool! Yes, I'm OK with the renaming if we have a working OpenCL backend!

nstiurca commented 8 years ago

OK, I will get started this weekend. Should we open an issue for the sake of tracking? Development-wise, it will be simplest for me to create an opencl branch on the fork of your project that I already have. Do you prefer to have such a branch in your repo as well until OpenCL support is stable (assuming we get there...)? It might be good to do that for the sake of anyone else that wants to help develop OpenCL support.

pluskid commented 8 years ago

I would suggest do it in your branch, but open a pull request to here, with "[WIP]" in the title and description of the goal and current progress in the text (that you could updates periodically). I will not merge the pr until you have something reasonablely stable, but people will see the pr and could probably jump in to help.

nstiurca commented 8 years ago

That works for me. Look for it later today.

outlace commented 8 years ago

I think this is great. I currently have to use Torch because it's the only mature package that has an OpenCL backend. Being able to run models on my Macbook is fantastic. Really looking forward to this getting OpenCL support.

nstiurca commented 8 years ago

@outlace Caffe also has a fork with OpenCL support, but unfortunately for me I haven't been able to get either Torch nor Caffe to work on 32-bit ARM processor even though it has a fully compliant OpenCL 1.1.

Thus, I am going to start on rolling my own. See PR #155.

lqh20 commented 8 years ago

Any plans to implement batch normalization (http://jmlr.org/proceedings/papers/v37/ioffe15.pdf )? Looks like it's a great step forward in terms of trainging time!

pluskid commented 8 years ago

@lqh20 I'm recently joining a new project MXNet. We are building a julia interface called MXNet.jl. It is still at relatively early stage, but some features are already working. For example, batch normalization and multi-GPU training in the cifar-10 example is already working quite nicely.

philtomson commented 8 years ago

Is MXNet.jl complementary to Mocha.jl or meant to replace it? On Oct 24, 2015 5:32 PM, "Chiyuan Zhang" notifications@github.com wrote:

@lqh20 https://github.com/lqh20 I'm recently joining a new project MXNet. We are building a julia interface called MXNet.jl https://github.com/dmlc/MXNet.jl. It is still at relatively early stage, but some features are already working. For example, batch normalization and multi-GPU training in the cifar-10 example https://github.com/dmlc/MXNet.jl/blob/master/examples/cifar10/cifar10.jl#L9 is already working quite nicely.

— Reply to this email directly or view it on GitHub https://github.com/pluskid/Mocha.jl/issues/22#issuecomment-150876242.

pluskid commented 8 years ago

@philtomson It depends. Mocha.jl still has its advantage of simplicity and portability. But in terms of computational efficiency or feature richness, I think MXNet.jl should be replacing Mocha.jl. Because it is built on top of libmxnet which is a language agnostic general deep learning library that is designed to have, for example, multi-GPU support. Moreover, the core component of libmxnet is being actively developed by a team, so in terms of features it is much better than Mocha.jl which is currently primarily developed by me in my very little free time. libmxnet itself is actually a joint efforts of authors from several different deep learning libraries.

philtomson commented 8 years ago

I wonder if mxnet could be an alternate backend for Mocha.jl? It seems like that would preserve the advantages of Mocha.jl - simplicity, portability, good documentation - while also allowing users to drop directly to your MXNet.jl bindings if needed.

On Sat, Oct 24, 2015 at 5:50 PM, Chiyuan Zhang notifications@github.com wrote:

@philtomson https://github.com/philtomson It depends. Mocha.jl still has its advantage of simplicity and portability. But in terms of computational efficiency or feature richness, I think MXNet.jl should be replacing Mocha.jl. Because it is built on top of libmxnet which is a language agnostic general deep learning library that is designed to have, for example, multi-GPU support. Moreover, the core component of libmxnet is being actively developed by a team, so in terms of features it is much better than Mocha.jl which is currently primarily developed by me in my very little free time. libmxnet itself is actually a joint efforts of authors from several different deep learning libraries.

— Reply to this email directly or view it on GitHub https://github.com/pluskid/Mocha.jl/issues/22#issuecomment-150876824.

pluskid commented 8 years ago

@philtomson That could be one possible option. I will wait and see if that is feasible. As using MXNet.jl introduce an external dependency on libmxnet. If that dependency itself is not a problem, then using MXNet.jl directly might be a more viable option. Though a something still needs to be improved, esp. documents.

philtomson commented 8 years ago

On Sun, Oct 25, 2015 at 8:32 PM, Chiyuan Zhang notifications@github.com wrote:

@philtomson https://github.com/philtomson That could be one possible option. I will wait and see if that is feasible. As using MXNet.jl introduce an external dependency on libmxnet. If that dependency itself is not a problem, then using MXNet.jl directly might be a more viable option.

Right. The scenario I was thinking of was being able to keep the kind of simple, declarative style of Mocha.jl while also being able to take advantage of the performance of MXNet.jl. Sure libmxnet has an advantage of being "language independent", however, that can also be a weakness. It could mean that you can't readily take advantage of powerful language features specific to Julia, like macros, for example (or at least it might be more difficult to do so). I suspect there's a lot of boilerplate code required when you use libmxnet that could be eliminated at a higher level of abstraction.

BTW: is the GPU backend of Mocha.jl not as performant as libmxnet? I guess from what I understand of the docs mxnet allows for multiple GPUs whereas Mocha.jl only allows for using one? If you compare performance between Mocha.jl with only one GPU and libmxnet with only one GPU are they pretty close?

I can see where training with multiple GPUs can be an advantage, but some users might be running pre-trained models on a laptop with only a single GPU (or some of us don't even have that as we only have an Intel integrated GPU which doesn't do CUDA) and the current setup of Mocha.jl is actually quite sufficient for doing this (people with this kind of setup wouldn't notice any appreciable difference from using libmxnet, perhaps)

Also: Does the mxnet project have any plans for supporting OpenCL?

Though a something still needs to be improved, esp. documents.

Mocha.jl's documents are actually pretty good at this point so this is a problem for someone who tries moving from Mocha.jl to MXNet.jl. Using Mocha.jl as a sort of a wrapper around MXNet.jl would mean you could probably keep most of the documentation as is.

I suppose another idea would be to translate the CPP backend for Mocha.jl to produce c++ code that make calls directly to libmxnet (or at least paramatize it so that you could use openMP (as now) or libmxnet in the CPP backend.

— Reply to this email directly or view it on GitHub https://github.com/pluskid/Mocha.jl/issues/22#issuecomment-151014087.

philtomson commented 8 years ago

I just got around to installing MXNet.jl and playing with it some. So far it doesn't seem too much more difficult to use than Mocha,

On Mon, Oct 26, 2015 at 4:01 PM, Phil Tomson philtomson@gmail.com wrote:

On Sun, Oct 25, 2015 at 8:32 PM, Chiyuan Zhang notifications@github.com wrote:

@philtomson https://github.com/philtomson That could be one possible option. I will wait and see if that is feasible. As using MXNet.jl introduce an external dependency on libmxnet. If that dependency itself is not a problem, then using MXNet.jl directly might be a more viable option.

Right. The scenario I was thinking of was being able to keep the kind of simple, declarative style of Mocha.jl while also being able to take advantage of the performance of MXNet.jl. Sure libmxnet has an advantage of being "language independent", however, that can also be a weakness. It could mean that you can't readily take advantage of powerful language features specific to Julia, like macros, for example (or at least it might be more difficult to do so). I suspect there's a lot of boilerplate code required when you use libmxnet that could be eliminated at a higher level of abstraction.

BTW: is the GPU backend of Mocha.jl not as performant as libmxnet? I guess from what I understand of the docs mxnet allows for multiple GPUs whereas Mocha.jl only allows for using one? If you compare performance between Mocha.jl with only one GPU and libmxnet with only one GPU are they pretty close?

I can see where training with multiple GPUs can be an advantage, but some users might be running pre-trained models on a laptop with only a single GPU (or some of us don't even have that as we only have an Intel integrated GPU which doesn't do CUDA) and the current setup of Mocha.jl is actually quite sufficient for doing this (people with this kind of setup wouldn't notice any appreciable difference from using libmxnet, perhaps)

Also: Does the mxnet project have any plans for supporting OpenCL?

Though a something still needs to be improved, esp. documents.

Mocha.jl's documents are actually pretty good at this point so this is a problem for someone who tries moving from Mocha.jl to MXNet.jl. Using Mocha.jl as a sort of a wrapper around MXNet.jl would mean you could probably keep most of the documentation as is.

I suppose another idea would be to translate the CPP backend for Mocha.jl to produce c++ code that make calls directly to libmxnet (or at least paramatize it so that you could use openMP (as now) or libmxnet in the CPP backend.

— Reply to this email directly or view it on GitHub https://github.com/pluskid/Mocha.jl/issues/22#issuecomment-151014087.

pluskid commented 8 years ago

@philtomson Glad to hear that it works out nicely for you.

The single-GPU performance of Mocha.jl might be similar to MXNet.jl. MXNet.jl has a more flexible symbolic API to define network architectures, but internally optimizations are used to avoid unnecessary memory allocation & computation, etc. But multi-GPU is definitely a win on MXNet.jl side.

I agree that many users with small scale applications do not use GPUs. In this case, the default CPU only libmxnet.so should still be quite straightforward to compile (at least on Linux and OS X). And since libmxnet is actually relatively low level backend, many of the logics will still be built in Julia, and the interface is actually flexible and convenient enough to use.

One of the main goal of the joint-force under the dmlc/libmxnet is to avoid duplicated labors especially in the computational heavy backend. One layer implemented will be automatically available in Python, Julia, R frontends.

Currently I will be maintaining both Mocha.jl and MXNet.jl. In the future when MXNet.jl become more mature, I will try to advocate MXNet.jl as a successor of Mocha.jl.

pluskid commented 8 years ago

For those who is interested in RNN/LSTM in Julia. Here is an char-rnn LSTM implementation in MXNet.jl now. It used explicit unrolling so everything fit in the current FeedForward model, therefore multi-GPU training can be used directly. For more general purpose variable length RNN without unrolling, we will still need to develop the modeling interface. I will add tutorial document soon.

pluskid / Mocha.jl

Roadmap #22