Closed danieldk closed 5 years ago
The nan losses in #28 are also related to this.
The dev set is usually quite small, what's the drawback of loading it into memory before the training loop?
The dev set is usually quite small, what's the drawback of loading it into memory before the training loop?
You never know what people throw at it. I'd rather do this properly in O(1) memory. I have all the bits already in other projects, but it'll have to wait until after the weekend.
This is currently making the implementation of alternating train/validation steps in
pretrain
difficult, since we cannot have two mutable references to the categorical encoder. The&mut self
is currently required because the categorical encoder updates aNumberer
.We could switch to interior mutability for
CategoricalEncoder
, however this makes this encoder unsharable between threads (and thus not work in sticker server) unless it's wrapped in anArc
.Possible solutions: