Decoding should be part of the model

See https://github.com/hipster-philology/pandora/pull/87#issuecomment-340266996 from @emanjavacas

Found out the reason for the issue with beam search. Currently the tagger expects the full distribution over the vocabulary at each step to take the argmax over the sequence. This means it is already implementing argmax decoding. For beam search to work, tagger should expect the already decoded output and not the full distribution. This means basically remove argmax decoding from tagger https://github.com/hipster-philology/pandora/blob/master/pandora/preprocessing.py#L591 and instead just expect the already decoded output for the lemma. I'd move all the decoding logic inside the model (not just for the lemmas when generated), since implementing more complex decoding methods like beam search is hard to abstract because they need to interact with the model during decoding. PR's welcome.

hipster-philology / pandora

Decoding should be part of the model #91