ottokart / punctuator2

A bidirectional recurrent neural network model with attention mechanism for restoring missing punctuation in unsegmented text
http://bark.phon.ioc.ee/punctuator
MIT License
657 stars 195 forks source link

Question about late fusion #49

Open ericbolo opened 4 years ago

ericbolo commented 4 years ago

I'm not sure this is the best place to ask this question since it concerns the model itself rather than the code.

But anyway here it is: why is there a need for a late fusion between the attention model and the unidirectional GRU model ?

Why not simply use the encoder-decoder model with attention proposed in Bahdanau et al/ (2016) (NMT by jointly learning to align and translate) ? In that model, the output at time t is directly conditioned on the weighted sum of the encoder sequences and the previous hidden state

Thanks @ottokart !

sontung commented 4 years ago

Probably this work was done in 2016 (as in the paper), hence the authors weren't aware of the new architecture. Maybe you can try with enc-dec and write a new paper.

ericbolo commented 4 years ago

Thanks for the input @sontung . The Bahdanau paper is cited in the punctuator paper, the authors draw directly from the encoder-decoder model, hence my question.

I will try the simple encoder-decoder setup at some point, and will post results here in case anyone is interested.

ottokart commented 4 years ago

Hi!

I did start with an encoder-decoder model, but realized that it's a slight overkill for punctuation restoration/sequence labelling. Encoder-decoder architecture is designed to solve problems where the length of the output sequence and the alignment between inputs and outputs is not known, which is untrue for punctuation restoration. It usually makes sense to "encode" things that we already know about the problem (and are true for every sample) into the network architecture.

I used late fusion because I wanted to use the attention model to give the model another way to use the global and maybe a more distant context (e.g., a word "what" appearing at the beginning of a very long question). But honestly, there is no need to use late fusion. Alternatives should work similarly. I have more detailed results and ablation studies in my thesis (page ~65).

Best, Ottokar

ericbolo commented 4 years ago

Thanks @ottokart , that clears things up ! And saves me the trouble of going down a rabbit hole :)