blocks-contrib - Githubissues

lukemetz commented 9 years ago

Hello, In the process of converting over an existing model to blocks I created a number of bricks and extensions that might be of use. (as well as a few that have already been added to blocks as I am slow)

They are all at: https://github.com/lukemetz/cuboid They are mainly meant for me, but if any of them would be useful to the repo I will be happy to send a pullrequest(s) with the additions.

I have the following in a finished-ish state: bricks.FilterPool, bricks.Dropout (I saw the dropout thread already, but I wanted a separate brick for various reasons). algorithms.NAG -- nesterov momentum datasets.DataStreamBackground -- which runs a datastream in the background.

Totally fine if you don't want anything too. Thanks for the awesome library either way!

rizar commented 9 years ago

Hi,

first of all thanks for your kind words!

Exactly for the cases like yours we plan to have another repository called blocks-contrib. This will host components from the users which can be useful but for one or another reason can not or should not be part of Blocks. I hope creation of blocks-contrib is a matter of a few days.

I think we would like to have Nesterov momentum in Blocks. Take a look at BaseMomentum and Momentum classes as example of how such things are done in Blocks. Or maybe better first wait until we launch blocks-contrib and push it there. We will much less picky for blocks-contrib regarding naming variables and docs, we will only ask for a unit-test to keep track of compatibility of the contributed code with cutting edge Blocks.

bartvm commented 9 years ago

@lukemetz Two questions: What's your motivation for wanting a separate Dropout brick? If other people would want one for the same reasons, we could consider adding it.

Secondly, I'd be curious to know whether you tested your DataStreamBackground under heavy loads, on GPU, etc. and if so, how did it do? I've been thinking about something similar, but came up with something far more complicated (using a separately started server which pushes NumPy arrays as byte streams through sockets using ZeroMQ, https://github.com/bartvm/fuel/issues/7).

The main problems I'm worried about in your approach (but I might just be paranoid):

I've heard stories about CUDA going haywire if you fork a process after binding Theano to a GPU already. Did you experience any of this? This was the main reason I considered starting a separate server instead of using multiprocessing.
Using a Queue requires your data to be pickled and unpickled, which I imagine can be relatively slow compared to just pushing the raw byte stream. I'm guessing you might not notice any of this if your training isn't I/O bound i.e. if your computations take much longer than your data preprocessing. In your case, did you run into this?
Although fork is supposed to use copy-on-write for the process state, it seems to have issues when used in Python processes that use lots of memory (see e.g. https://github.com/Theano/Theano/issues/2184).

If things went smoothly in your case, I think it could be worth considering going with your approach, because it is more user-friendly and far less complicated. On comment though: I would be worried about putting a join call in __del__ for two reasons: __del__ is not guaranteed to be called; and it seems like you might end up getting stuck when the interpreter exits and the join call never returns because of an error somewhere. Using a context manager somewhere, somehow, would be preferable I guess.

lukemetz commented 9 years ago

@rizar Great. Sounds like a good repository. Blocks contrib sounds like a much better place than in the main blocks repo.

@bartvm : Dropout : Disclaimer: I did not read the thread thoroughly nor put a huge amount of time thinking about this design. My motivation was first practical, I wanted something to work with fairly little time. And secondly I wanted a little more control and transparency. I wanted to be able to see exactly where I was injecting noise and how much I was doing so. I worried that things would get complicated if I started messing with the computation graph as there are a few similar computations I wanted to add, such as Batch Normalization. To me dropout seemed more similar to an activation than something completely different. The big negative to this that I have run into is I had to figure out a way to run through the same sequence of blocks 2 times, one for inference, and one for training.

@bartvm : DataStreamBackground : The dataset for the model I wrote this for spits out filenames, I then have a datastreamWrapper to read and augment them to 1x64x64 images. The augmentation (it also does its computation via a multiprocessing pool.map) takes like ~30 seconds and just gpu time takes ~60 seconds. So per epoch I save around ~25-30 seconds. I am not IO bound, small images + ssd helps. I totally agree with your concerns. I didn't make a fork of blocks as I am terrified of ensuring multiprocess applications work well and didn't want to promise anything. I thought it would be useful to all to share some of my work though.

multiprocessing issues: I have not seen anything like this. I have only tested this on cpu based though datastreams though.

Queue: I have not done any serious benchmarking. I am definitely gpu computation bound. My logic is that my batches are small, one copy will be less than the computation to create the batch. This is definitely not ideal for everyone.

fork: Good to know. Luckly i have not run into these issues.

__del__: Good call. Thanks for the tip!

Thanks for the awesome feedback! Sounds like I will chill and wait for blocks-contrib.

mila-iqia / blocks

blocks-contrib #339