Closed KhoomeiK closed 7 months ago
@frankaging Did you get rid of the topological order enforcement for the intervention list? Does that mean we can specify arbitrary names for keys in the type_to_module_mapping
? That would mean we can add the text decoder + image encoders as well for VLMs.
@frankaging Did you get rid of the topological order enforcement for the intervention list? Does that mean we can specify arbitrary names for keys in the
type_to_module_mapping
? That would mean we can add the text decoder + image encoders as well for VLMs.
yes! that is removed. you can add arbitrary names. and you can also intervene without specifying the mapping now if it is a standard intervention point, e.g., no QKV split, no head split as in transformer.
I am merging this PR as it seems like it will not break anything, and adds additional model supports. Future bug PRs can be opened separately if there is any issue.
Description
Current BLIP implementation by @aryamanarora is for VQA, but our upcoming experiments with ColorSwap will require BLIP-ITM. I've added new model definition files specifically for the BLIP-ITM setup, as it's fairly different in architecture (vis enc -> text enc -> ITM head) from BLIP-VQA (vis enc -> text enc -> text dec). It's probably possible to capture both of them with a single general definition, but I wanted to take the lowest risk path for my first PR.
Testing Done
Haven't added any tests or done any testing yet. Just wanted to open a PR with the [mostly complete] work I've done already as it's Sunday night and I'm not sure when I'll be able to get back around to this next.
Checklist:
[Your Priority] Your Title