Open srikhetramohanty opened 2 days ago
Great. Thanks. While I go through the 2 papers you referenced, what were your observations when you compared tokenized outputs?
I mainly focused on comparing Qwen1.5 and GPT2, where I found that nearly 70% of tokens were overlapped in the two tokenized sequences and these tokens could be aligned more easily by our cross-model attention.