sdobber / FluxArchitectures.jl

Complex neural network examples for Flux.jl
MIT License
122 stars 15 forks source link

Doubt about Stacked LSTMs #46

Closed gabrevaya closed 2 years ago

gabrevaya commented 2 years ago

Hi! First of all, thanks a lot for FluxArchitectures.jl! :)

More than an issue, this is a question because of my misunderstanding. While checking the documentation, I realized that I might have been using a wrong implementation of a Stacked LSTM in my codes. However, I’m confused with your current implementation.

I was comparing the description of the Stacked LSTM from your blog with your current implementation and I don’t understand why you are not using the HiddenRecur anymore. Now it seems like you are chaining the LSTMs just in the regular way, matching the inner dimensions but not feeding the inner cells with the previous hidden states and memories.

Also, in

function (m::StackedLSTMCell)(x)
    out = m.chain(x)
    m.state = out
    return out
end

you are saving only the output in the state of the StackedLSTMCell, so I don’t understand how this would be fixing the Flux issue that you describe in your blog (Flux.jl's standard setup only allows feeding the output of one cell as the new input) or is it the case that now Flux behavior has changed fixing this issue?