Open NohTow opened 3 months ago
Update: we are now working on the batch level so we can only pad to the longest document in the whole batch (instead of to max_doc_length). This can be achieved by simply setting pad_document to false in the tokenize function defined in the collator. However, I am letting it to true as the default right now since I surprisingly observed more VRAM usage when setting it to false (and also need to bench its performance impact w.r.t the .compile function).
Right now, for simplicity, during distillation we pad every document to the max length so we can easily stack them to compute the scores. An optimization would be to only pad them to the longest in the column (since we are proceeding the documents column by column) and add the padding afterwards