Make SentenceEncoder::encode take self immutably

danieldk commented 5 years ago

This is currently making the implementation of alternating train/validation steps in pretrain difficult, since we cannot have two mutable references to the categorical encoder. The &mut self is currently required because the categorical encoder updates a Numberer.

We could switch to interior mutability for CategoricalEncoder, however this makes this encoder unsharable between threads (and thus not work in sticker server) unless it's wrapped in an Arc.

Possible solutions:

Wrap the numberer in an Arc (what is the performance impact)?
Make mutable and immutable flavors of this encoder.

twuebi commented 5 years ago

The nan losses in #28 are also related to this.

twuebi commented 5 years ago

The dev set is usually quite small, what's the drawback of loading it into memory before the training loop?

danieldk commented 5 years ago

The dev set is usually quite small, what's the drawback of loading it into memory before the training loop?

You never know what people throw at it. I'd rather do this properly in O(1) memory. I have all the bits already in other projects, but it'll have to wait until after the weekend.

stickeritis / sticker

Make SentenceEncoder::encode take self immutably #155