[P1] Added BLIP-ITM model definitions

KhoomeiK commented 7 months ago

Description

Current BLIP implementation by @aryamanarora is for VQA, but our upcoming experiments with ColorSwap will require BLIP-ITM. I've added new model definition files specifically for the BLIP-ITM setup, as it's fairly different in architecture (vis enc -> text enc -> ITM head) from BLIP-VQA (vis enc -> text enc -> text dec). It's probably possible to capture both of them with a single general definition, but I wanted to take the lowest risk path for my first PR.

Testing Done

Haven't added any tests or done any testing yet. Just wanted to open a PR with the [mostly complete] work I've done already as it's Sunday night and I'm not sure when I'll be able to get back around to this next.

Checklist:

[X] My PR title strictly follows the format: [Your Priority] Your Title
[ ] I have attached the testing log above
[X] I provide enough comments to my code
[ ] I have changed documentations
[ ] I have added tests for my changes

aryamanarora commented 7 months ago

@frankaging Did you get rid of the topological order enforcement for the intervention list? Does that mean we can specify arbitrary names for keys in the type_to_module_mapping? That would mean we can add the text decoder + image encoders as well for VLMs.

frankaging commented 7 months ago

@frankaging Did you get rid of the topological order enforcement for the intervention list? Does that mean we can specify arbitrary names for keys in the type_to_module_mapping? That would mean we can add the text decoder + image encoders as well for VLMs.

yes! that is removed. you can add arbitrary names. and you can also intervene without specifying the mapping now if it is a standard intervention point, e.g., no QKV split, no head split as in transformer.

frankaging commented 7 months ago

I am merging this PR as it seems like it will not break anything, and adds additional model supports. Future bug PRs can be opened separately if there is any issue.

stanfordnlp / pyvene