[question]Did you use the mean value of all token embedding in repr filter?

Force1ess commented 10 months ago

For Semantic-based, we encode one sentence by averaging all embeddings of its tokens due to the limited context length for the model.

How about Model-based? the embedding of a sentence is a one-dimensional vector or two?

VPeterV commented 10 months ago

For Semantic-based, we encode one sentence by averaging all embeddings of its tokens due to the limited context length for the model.

How about Model-based? the embedding of a sentence is a one-dimensional vector or two?

In model-based approaches, we obtain the representation of a sentence by using the hidden states of its last token. For semantic-based methods, we address the limitations of the current encoder-only models, which have a restricted context window. To do this, we employ a sliding window technique with a chunk size of 512 tokens. We then compute the average representation of these chunks to form the final sentence representation. For example, consider a sentence containing 1024 tokens. We divide it into two chunks (Chunk A1 and Chunk A2, each with 512 tokens). We calculate the average representation of all tokens within each chunk, resulting in two separate embeddings - one for each chunk. Finally, we average these two chunk embeddings to obtain the overall sentence representation.

Force1ess commented 10 months ago

thank u for clarifying

hkust-nlp / deita

[question]Did you use the mean value of all token embedding in repr filter? #10