hkust-nlp / deita

Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
Apache License 2.0
458 stars 28 forks source link

[question]Did you use the mean value of all token embedding in repr filter? #10

Closed Force1ess closed 7 months ago

Force1ess commented 7 months ago

For Semantic-based, we encode one sentence by averaging all embeddings of its tokens due to the limited context length for the model.

How about Model-based? the embedding of a sentence is a one-dimensional vector or two?

VPeterV commented 7 months ago

For Semantic-based, we encode one sentence by averaging all embeddings of its tokens due to the limited context length for the model.

How about Model-based? the embedding of a sentence is a one-dimensional vector or two?

In model-based approaches, we obtain the representation of a sentence by using the hidden states of its last token. For semantic-based methods, we address the limitations of the current encoder-only models, which have a restricted context window. To do this, we employ a sliding window technique with a chunk size of 512 tokens. We then compute the average representation of these chunks to form the final sentence representation. For example, consider a sentence containing 1024 tokens. We divide it into two chunks (Chunk A1 and Chunk A2, each with 512 tokens). We calculate the average representation of all tokens within each chunk, resulting in two separate embeddings - one for each chunk. Finally, we average these two chunk embeddings to obtain the overall sentence representation.

Force1ess commented 7 months ago

thank u for clarifying