Open pluskid opened 9 years ago
Even though restricted Boltzmann machines (and DBMs/DBNs) and autoencoders (DAE, CAE, stacked autoencoders) have a different principle as they are unsupervised, having an implementation that follows the Mocha architecture could be useful. We started discussing this for DBNs here, as we have a simple implementation for RBMs and DBNs and would like to make it compatible with Mocha.
@jfsantos Thanks! I think autoencoders, although unsupervised, are still trained with SGD, we just specify the label to be the same as the input data, and then in principle we could already do this in Mocha. And we might need to add some special layers to support variants of autoencoders. But I might be wrong, as I haven't worked on autoencoders at all. Do you know the details?
As for Bayesian networks, yes, I agree they are very different paradigms. And especially we already have a package (dfdx/Boltzmann.jl#3) on that, I think it is better to keep them in two different packages. But definitely making them compatible should be a goal, and maybe some collaboration.
For using DBNs/DBMs to initialize the weights of DNNs, I think this might already be quite easy. If you could export the weights to HDF5 file with compatible naming, then Mocha should be able to load them, just like loading Caffe's exported models, and then start supervised training on that. We could make Mocha's loading interface richer by for example, allow the user to control in fine details which layer should load from which file a dataset with which name, etc. Also we could probably discuss about a common data format that suits both needs.
You are right about autoencoders being trained with SGD as MLPs. There are some "special" things, though:
I'll work on a draft implementation for initializing a DNN with a DBN from Boltzmann.jl and let you know as soon as I have something (hopefully, by submitting a pull request!).
@jfsantos Thanks for the details! I see, it is kind of do-able but not trivial. I need to think about this further.
Just wondering what the ETA for recurrence support might be?
@philtomson That is definitely a plan/goal, but maybe after the auto-encoders. The reason is that I do not know RNN enough to start implement them right away. But I think many of the building-blocks are already there. Especially if you want to do a simple explicit unfolding of fixed-length history, I think one could already have a model like that by making use of the shared-parameter mechanism in Mocha. For variable-length RNN support, I need to think more, especially about how the interface should be organized.
That being said, suggestions are very welcome from people who already know RNN. For example, what is the simplest, representative and reproducible example for RNN (like MNIST for CNN)? Are there any nice existing library for RNN (whose way of organizing the user interface we should possible learn from)? etc.
@pluskid Maybe the followings are helpful: Andrej Karpathy's Neuraltalk: https://github.com/karpathy/neuraltalk Alex Graves's RNNLIB http://sourceforge.net/projects/rnnl/
@zhongwen Thanks for the links!
I'm planning to add time-delay neural networks. I have a working implementation ( https://github.com/the-moliver/NeuralNets.jl ) that I want to port to Mocha.
It would be nice to have a Caffe file -> Mocha converter. Maybe I'll work on something like that. Should be doable, right? Or are there Caffe features that are not yet in Mocha?
We already have the ability to load caffe models, but you still need to manually translate the model definition. Automatic translation of architecture is theoretically possible but I guess might by quite tedious to implement. (I'm thinking maybe there should be some universal Dnn architecture specification language coming out recently). Most of the core functionality in caffe has correspondence in mocha. But caffe also have many unofficial forks, which implemented some specific layers, for those, it is more difficult to convert.
@nikolaypavlov Thanks for the suggestions
maxout
is simply a max pooling over some units. We can achieve this by using the existing PoolingLayer
or ChannelPoolingLayer
. Let me know if you are talking about something else.filter_cons
for ConvolutionLayer
.@pluskid Great, I'll try to play with PoolingLayer.
Is this project meant to be the Theano/Torch of Julia?
Is there ever going to be OpenCL support?
@outlace, this is more like torch than theano in that sense. There is no planned Opencl support unless Julia gets better native support for gpu targets.
I would be very interested in OpenCL support as well. In fact, I have half a mind to take a stab at it myself. If I can leverage an OpenCL BLAS library (say, CLBLAS.jl), then I basically just have to write im2col.cl
and a couple of pooling and neuron kernels, and structure everything else similarly to the CUDA backend.
If I did this, in the interest of clarity would you be OK with renaming GPUBackend -> CUDABackend (adding @deprecated typealias GPUBackend CUDABackend
or similar for compatibility), and naming the new backend OCLBackend
?
@nstiurca Thanks! This could be cool! Yes, I'm OK with the renaming if we have a working OpenCL backend!
OK, I will get started this weekend. Should we open an issue for the sake of tracking? Development-wise, it will be simplest for me to create an opencl
branch on the fork of your project that I already have. Do you prefer to have such a branch in your repo as well until OpenCL support is stable (assuming we get there...)? It might be good to do that for the sake of anyone else that wants to help develop OpenCL support.
I would suggest do it in your branch, but open a pull request to here, with "[WIP]" in the title and description of the goal and current progress in the text (that you could updates periodically). I will not merge the pr until you have something reasonablely stable, but people will see the pr and could probably jump in to help.
That works for me. Look for it later today.
I think this is great. I currently have to use Torch because it's the only mature package that has an OpenCL backend. Being able to run models on my Macbook is fantastic. Really looking forward to this getting OpenCL support.
@outlace Caffe also has a fork with OpenCL support, but unfortunately for me I haven't been able to get either Torch nor Caffe to work on 32-bit ARM processor even though it has a fully compliant OpenCL 1.1.
Thus, I am going to start on rolling my own. See PR #155.
Any plans to implement batch normalization (http://jmlr.org/proceedings/papers/v37/ioffe15.pdf )? Looks like it's a great step forward in terms of trainging time!
@lqh20 I'm recently joining a new project MXNet. We are building a julia interface called MXNet.jl. It is still at relatively early stage, but some features are already working. For example, batch normalization and multi-GPU training in the cifar-10 example is already working quite nicely.
Is MXNet.jl complementary to Mocha.jl or meant to replace it? On Oct 24, 2015 5:32 PM, "Chiyuan Zhang" notifications@github.com wrote:
@lqh20 https://github.com/lqh20 I'm recently joining a new project MXNet. We are building a julia interface called MXNet.jl https://github.com/dmlc/MXNet.jl. It is still at relatively early stage, but some features are already working. For example, batch normalization and multi-GPU training in the cifar-10 example https://github.com/dmlc/MXNet.jl/blob/master/examples/cifar10/cifar10.jl#L9 is already working quite nicely.
— Reply to this email directly or view it on GitHub https://github.com/pluskid/Mocha.jl/issues/22#issuecomment-150876242.
@philtomson It depends. Mocha.jl still has its advantage of simplicity and portability. But in terms of computational efficiency or feature richness, I think MXNet.jl should be replacing Mocha.jl. Because it is built on top of libmxnet which is a language agnostic general deep learning library that is designed to have, for example, multi-GPU support. Moreover, the core component of libmxnet is being actively developed by a team, so in terms of features it is much better than Mocha.jl which is currently primarily developed by me in my very little free time. libmxnet itself is actually a joint efforts of authors from several different deep learning libraries.
I wonder if mxnet could be an alternate backend for Mocha.jl? It seems like that would preserve the advantages of Mocha.jl - simplicity, portability, good documentation - while also allowing users to drop directly to your MXNet.jl bindings if needed.
On Sat, Oct 24, 2015 at 5:50 PM, Chiyuan Zhang notifications@github.com wrote:
@philtomson https://github.com/philtomson It depends. Mocha.jl still has its advantage of simplicity and portability. But in terms of computational efficiency or feature richness, I think MXNet.jl should be replacing Mocha.jl. Because it is built on top of libmxnet which is a language agnostic general deep learning library that is designed to have, for example, multi-GPU support. Moreover, the core component of libmxnet is being actively developed by a team, so in terms of features it is much better than Mocha.jl which is currently primarily developed by me in my very little free time. libmxnet itself is actually a joint efforts of authors from several different deep learning libraries.
— Reply to this email directly or view it on GitHub https://github.com/pluskid/Mocha.jl/issues/22#issuecomment-150876824.
@philtomson That could be one possible option. I will wait and see if that is feasible. As using MXNet.jl introduce an external dependency on libmxnet. If that dependency itself is not a problem, then using MXNet.jl directly might be a more viable option. Though a something still needs to be improved, esp. documents.
On Sun, Oct 25, 2015 at 8:32 PM, Chiyuan Zhang notifications@github.com wrote:
@philtomson https://github.com/philtomson That could be one possible option. I will wait and see if that is feasible. As using MXNet.jl introduce an external dependency on libmxnet. If that dependency itself is not a problem, then using MXNet.jl directly might be a more viable option.
Right. The scenario I was thinking of was being able to keep the kind of simple, declarative style of Mocha.jl while also being able to take advantage of the performance of MXNet.jl. Sure libmxnet has an advantage of being "language independent", however, that can also be a weakness. It could mean that you can't readily take advantage of powerful language features specific to Julia, like macros, for example (or at least it might be more difficult to do so). I suspect there's a lot of boilerplate code required when you use libmxnet that could be eliminated at a higher level of abstraction.
BTW: is the GPU backend of Mocha.jl not as performant as libmxnet? I guess from what I understand of the docs mxnet allows for multiple GPUs whereas Mocha.jl only allows for using one? If you compare performance between Mocha.jl with only one GPU and libmxnet with only one GPU are they pretty close?
I can see where training with multiple GPUs can be an advantage, but some users might be running pre-trained models on a laptop with only a single GPU (or some of us don't even have that as we only have an Intel integrated GPU which doesn't do CUDA) and the current setup of Mocha.jl is actually quite sufficient for doing this (people with this kind of setup wouldn't notice any appreciable difference from using libmxnet, perhaps)
Also: Does the mxnet project have any plans for supporting OpenCL?
Though a something still needs to be improved, esp. documents.
Mocha.jl's documents are actually pretty good at this point so this is a problem for someone who tries moving from Mocha.jl to MXNet.jl. Using Mocha.jl as a sort of a wrapper around MXNet.jl would mean you could probably keep most of the documentation as is.
I suppose another idea would be to translate the CPP backend for Mocha.jl to produce c++ code that make calls directly to libmxnet (or at least paramatize it so that you could use openMP (as now) or libmxnet in the CPP backend.
— Reply to this email directly or view it on GitHub https://github.com/pluskid/Mocha.jl/issues/22#issuecomment-151014087.
I just got around to installing MXNet.jl and playing with it some. So far it doesn't seem too much more difficult to use than Mocha,
On Mon, Oct 26, 2015 at 4:01 PM, Phil Tomson philtomson@gmail.com wrote:
On Sun, Oct 25, 2015 at 8:32 PM, Chiyuan Zhang notifications@github.com wrote:
@philtomson https://github.com/philtomson That could be one possible option. I will wait and see if that is feasible. As using MXNet.jl introduce an external dependency on libmxnet. If that dependency itself is not a problem, then using MXNet.jl directly might be a more viable option.
Right. The scenario I was thinking of was being able to keep the kind of simple, declarative style of Mocha.jl while also being able to take advantage of the performance of MXNet.jl. Sure libmxnet has an advantage of being "language independent", however, that can also be a weakness. It could mean that you can't readily take advantage of powerful language features specific to Julia, like macros, for example (or at least it might be more difficult to do so). I suspect there's a lot of boilerplate code required when you use libmxnet that could be eliminated at a higher level of abstraction.
BTW: is the GPU backend of Mocha.jl not as performant as libmxnet? I guess from what I understand of the docs mxnet allows for multiple GPUs whereas Mocha.jl only allows for using one? If you compare performance between Mocha.jl with only one GPU and libmxnet with only one GPU are they pretty close?
I can see where training with multiple GPUs can be an advantage, but some users might be running pre-trained models on a laptop with only a single GPU (or some of us don't even have that as we only have an Intel integrated GPU which doesn't do CUDA) and the current setup of Mocha.jl is actually quite sufficient for doing this (people with this kind of setup wouldn't notice any appreciable difference from using libmxnet, perhaps)
Also: Does the mxnet project have any plans for supporting OpenCL?
Though a something still needs to be improved, esp. documents.
Mocha.jl's documents are actually pretty good at this point so this is a problem for someone who tries moving from Mocha.jl to MXNet.jl. Using Mocha.jl as a sort of a wrapper around MXNet.jl would mean you could probably keep most of the documentation as is.
I suppose another idea would be to translate the CPP backend for Mocha.jl to produce c++ code that make calls directly to libmxnet (or at least paramatize it so that you could use openMP (as now) or libmxnet in the CPP backend.
— Reply to this email directly or view it on GitHub https://github.com/pluskid/Mocha.jl/issues/22#issuecomment-151014087.
@philtomson Glad to hear that it works out nicely for you.
The single-GPU performance of Mocha.jl might be similar to MXNet.jl. MXNet.jl has a more flexible symbolic API to define network architectures, but internally optimizations are used to avoid unnecessary memory allocation & computation, etc. But multi-GPU is definitely a win on MXNet.jl side.
I agree that many users with small scale applications do not use GPUs. In this case, the default CPU only libmxnet.so should still be quite straightforward to compile (at least on Linux and OS X). And since libmxnet is actually relatively low level backend, many of the logics will still be built in Julia, and the interface is actually flexible and convenient enough to use.
One of the main goal of the joint-force under the dmlc/libmxnet is to avoid duplicated labors especially in the computational heavy backend. One layer implemented will be automatically available in Python, Julia, R frontends.
Currently I will be maintaining both Mocha.jl and MXNet.jl. In the future when MXNet.jl become more mature, I will try to advocate MXNet.jl as a successor of Mocha.jl.
For those who is interested in RNN/LSTM in Julia. Here is an char-rnn LSTM implementation in MXNet.jl now. It used explicit unrolling so everything fit in the current FeedForward
model, therefore multi-GPU training can be used directly. For more general purpose variable length RNN without unrolling, we will still need to develop the modeling interface. I will add tutorial document soon.
Discussions and/or suggestions are welcome!