length penalty (or rather shortness penalty) in beam search

there are other ways to promote higher lengths with different kinds of penalties but dividing seems to work fine. I do notice more repeats now though, so there is a tradeoff here.

Maybe there can be a fractional exponent hyperparameter on the divided length that can be tuned in order to find the best performance on a validation set? Or maybe this problem will go away with a good enough architecture/way of training rather than changing the search strategy in inference.

Regardless, this has now been implemented.

nowittynamesleft / protein_function_description

length penalty (or rather shortness penalty) in beam search #16