Why padding all text to max length in trainer.base. _get_decoded_sequences

jxmorris12 / vec2text

utilities for decoding deep representations (like sentence embeddings) back to text

Other

673 stars 75 forks source link

Why padding all text to max length in trainer.base. _get_decoded_sequences #37

Closed liyongkang123 closed 5 months ago

liyongkang123 commented 5 months ago

Hi John, Thanks for your great work.

I would like to know why you are padding both the generated text and true_input_ids to the maximum length in the _get_decoded_sequences function. Is this operation necessary for later evaluation calculations?

Thank you for your help!

jxmorris12 commented 5 months ago

I think it was so that we could keep a list of the "best text seen so far", in case the length changed between generations, so that everything can be stacked into the same-sized list. I think you can disable this if you're not using that feature.

liyongkang123 commented 5 months ago

Got it! Thanks a lot~