stickeritis / sticker

Succeeded by SyntaxDot: https://github.com/tensordot/syntaxdot
Other
25 stars 2 forks source link

Make SentenceEncoder::encode take self immutably #155

Closed danieldk closed 4 years ago

danieldk commented 4 years ago

This is currently making the implementation of alternating train/validation steps in pretrain difficult, since we cannot have two mutable references to the categorical encoder. The &mut self is currently required because the categorical encoder updates a Numberer.

We could switch to interior mutability for CategoricalEncoder, however this makes this encoder unsharable between threads (and thus not work in sticker server) unless it's wrapped in an Arc.

Possible solutions:

twuebi commented 4 years ago

The nan losses in #28 are also related to this.

twuebi commented 4 years ago

The dev set is usually quite small, what's the drawback of loading it into memory before the training loop?

danieldk commented 4 years ago

The dev set is usually quite small, what's the drawback of loading it into memory before the training loop?

You never know what people throw at it. I'd rather do this properly in O(1) memory. I have all the bits already in other projects, but it'll have to wait until after the weekend.