Closed farizrahman4u closed 6 years ago
@EderSantana is there any way to bring it back?
Well, theoretically, all models in theano can be ported to tensorflow..convert op by op to Keras backend. Wouldn't that be possible? Whatever it takes, it would be really great to have it back, as it shows off the power and capabilities of keras.
I could only get it to work with the old mask/sample weight behavior (pre-backend) though.
But starting next week or so I should give another try since lots of changes were made to rnn.
I even think @wxs was trying to redo a few of his old pre-backend contributions. Was his PR accepted already?
@EderSantana #1310 is merged. The whole mask is passed to the rnn
function instead of the masking
boolean. This is pretty much similar to how things were in the pre-tensorflow era.
Tkx @farizrahman4u I'll check it out!!!
@EderSantana Any updates?
@farizrahman4u let me read this carefully, I used to be able to pass the mask by hand, using sample_weight
, can I still do that? I resumed working on this...
@EderSantana you can't pass a per-timestep mask any more (see #1537 ). I'm not sure I understand the justification for removing this but I've not had a chance to investigate much further.
We need per-timestep masking for our use (as would anyone doing sequence to sequence learning) and so I will be looking into solutions in the next few days.
@wxs for my problem, I have all zeros as masking and regular inputs (when I have "no input" but still want the RNN to keep running). I'll try with -1 in that case, but yeah, being able to pass per time step mask would be awesome. Let me know when you PR and I'll give it a review. Thanks in advance for the work!!!
There seems to be significant demand for having per-timestep weights besides sample weights. I will look into re-introducing them (maybe through a different interface from sample_weights, or maybe the same as before (but with less bugs ;-)).
On 27 January 2016 at 06:50, Eder Santana notifications@github.com wrote:
@wxs https://github.com/wxs for my problem, I have all zeros as masking and regular inputs (when I have "no input" but still want the RNN to keep running). I'll try with -1 in that case, but yeah, being able to pass per time step mask would be awesome. Let me know when you PR and I'll give it a review. Thanks in advance for the work!!!
— Reply to this email directly or view it on GitHub https://github.com/fchollet/keras/issues/1520#issuecomment-175665029.
Glad to hear it :) I'm not sure if there are any applications for per-timestep weights other than masking, so if for clarity you designed it as a separate parameter to fit
that took a binary mask array instead of overloading sample_weights I suspect that would still cover every usecase.
There is a simple, very general fix we could roll out (changing 3 lines in weighted_objective). It would make input validation more brittle, however. I will think about it for a little bit.
The nature of the fix: currently we reduce the loss array from nD to 1D (by taking the mean over axes [1..n]) then apply the weights (which should thus be 1D as well, i.e. there should be a 1:1 mapping between samples and weights, since the weights are sample weights). We could instead reduce the loss array to [1..p](p > 1), where p is the ndim of the weight array. We would thus assume that there is a 1:1 mapping between units in the first p axes of the network output, and units in the weight array. This generalizes to timestep weights, but also to more funky possibilities with p > 2 (I can't think of any application though).
Thoughts?
On 27 January 2016 at 11:49, Xavier Snelgrove notifications@github.com wrote:
Glad to hear it :) I'm not sure if there are any applications for per-timestep weights other than masking, so if for clarity you designed it as a separate parameter to fit that took a binary mask array instead of overloading sample_weights I suspect that would still cover every usecase.
— Reply to this email directly or view it on GitHub https://github.com/fchollet/keras/issues/1520#issuecomment-175821588.
I have thought about the fix for temporal sample weights detailed above, and I have decided it is the best solution. It is now in master. Fixes your issue @wxs @EderSantana
Great! Look forward to trying this. Thanks @fchollet .
:clap: very happy to hear
@fchollet but we are still assuming the sample_weights
are 1D https://github.com/fchollet/keras/blob/master/keras/models.py#L443
@EderSantana I think the only way to tackle that would be through a pretty strong assumption: weights ndim should be 2 if and only if output ndim is 3. Other weight ndim is 1.
Other ideas?
An alternative would be to introduce a sample_weight_mode
argument to compile
, that could take the value temporal
. This would increase API complexity though. I think making the sample weight ndim fully configurable (as an integer) would be more general but also more difficult to understand for most users.
Maybe allow the ndim input, but use len(self.output_shape)
or len(self.output_shape)-1
as default.
sample_weight
multiplies the final output, so the user might know (I think?) that it has to have matching dimensions to the final output or broadcastable dimensions.
Anyways, sample_weight
requires an advanced API for complicated use cases. Make a choice here and let us simplify later as our experience and user feedback grows.
I like @EderSantana's idea with having a default but allowing this to be passed as an argument, and coming up with a good API will depend on having a more exhaustive sense of what the use cases are, which will come through experience.
Thinking about this a bit more, maybe this is an opportunity to clarify the functional distinction between class weights and sample weights? That part of the api has always felt a bit awkward to me.
I have a fix with extensive tests. Only one issue: it doesn't work with Theano. It works fine with TensorFlow. This is very strange, because: 1) the added code doesn't do anything exotic or complicated, it's actually very basic, and 2) Theano doesn't crash, it hangs, with no error message that could be used to diagnose the problem. This happens during the compilation step.
Check it out: https://github.com/fchollet/keras/tree/temporal_sample_weighting
An update; actually it works in general, but the model I built for the tests does not compile with Theano. This appears to be a Theano optimizer bug, possibly linked to dimshuffle
.
okay, just checking out your branch now. Will report back.
I have fixed the tests and will merge the branch in a few minutes (waiting for Travis). The bug isn't related to the changes in the PR.
yea, this is working for me as expected in theano.
edit: well, mostly? Loss is lower then expected, going to dig into that a little bit.
Merged. We now support both sample-wise and timestep-wise loss weighting, with arbitrary output data shapes.
Progress! It is almost working again.
@fchollet @EderSantana Good job!
@EderSantana you might want to take a look at #1593 if you're ever using categorical_crossentropy, as I think it was broken in the backend-switch for time-sequence cases.
@wxs I feel like there is something different, at least comparing the error magnitudes with what I used to have... well this how the memory looks like in the end.
I other words, it doesn't start writing early enough to generalize well.
@EderSantana is that on wxs's branch(#1593), or pre?
I'm also having some issues with categorical_crossentropy but @wxs's branch doesn't run for me.
@EderSantana I definitely expect differences in error magnitudes. Before, the error across all timesteps was always being averaged, prior to timestep masking, if you were using categorical cross-entropy.
NTM uses binary_crossentropy though. But still see differences compared to previous implementations. It looks like LSTMs are more powerful and overfitting or something. I don't know yet. I'll keep experimenting with this.
I think we did it again boys. But NTMs are really sensitive to hyperparameters, which Graves et. al. didn't mention. I have to write a few notes about it:
@EderSantana Great job :+1: Note that you can your PR your notes to the blog repo. I remember you had some NTM wiki that you posted as an issue and later as a wiki page. It was pretty good. Brushing it up should do the job. www.github.com/fchollet/keras-blog
@EderSantana very cool! Any plans to make a new PR for the NTM example?
I agreed with @farizrahman4u, this would make a great post on the blog, if you want to release a write-up.
@EderSantana what is the status of NTM implementation in keras?
I haven't worked on that in a long time. There is an implementation here https://github.com/EderSantana/seya But NTM were less useful than I expected, so I'm not working on them anymore...
Thanks @EderSantana. If it's not so useful, any tips on what other architectures networks are useful for memory (i.e. remembering things from a long time back)?
The sucessor to the NTM is called differentiable neural computer (DNC), see the paper and the corresponding blog post. My gut feeling is that for real-world applications with RNNs today, one should stick with a regular GRU and spend most time cleaning the training data and thinking carefully about the input and output representation instead.
@carlthome Yes, but they leave out all mathematical details to code the DNC in the paper. They report some astoundingly complex abilities of the machine. Is it silly to say the authors don't think the world is ready for the DNC? Or else why not release the mathmatical bits.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
The neural turing machine by @EderSantana is mentioned twice in the readme. but was removed long ago from the repo.