Open Henry0528 opened 1 year ago
so you changed both q and k to v in openclip/transformer.py class Attention for v-v attention, is there any other modification to the original code of openclip?
We keep the original q-k-v path, but we extract v-v attention from each layer and aggregate them.
so you changed both q and k to v in openclip/transformer.py class Attention for v-v attention, is there any other modification to the original code of openclip?