As a token of gratitude for the great work, some updates to help people reading & running the code 🤗.
Below a list of all the changes. In addition to the committed changes, there were some potential changes that i could imagine improving the experimental setup. These are left as unchecked boxes. I separated changes to the training pipeline and the benchmarking notebook into 2 branches, just in case.
Training Pipeline
[x] Load Pets dataset through fastai api removing need to specify dataset location manually
[x] Remove code duplicates by merging classifier, surrogate & explainer backbone init & loading into a base module
[x] Fix checkpointing
[x] Add conda environment.yaml for all the 🐍 enjoyers
[ ] Set default precision to 32-bit (this fixed nan encounters due to numerical issues on ImageNette for me)
Benchmarking Notebook
[x] fix import (there seemed to be some rename vitmedical.modules.explainer -> vit_shapley.modules.explainer)
[x] fix checkpointing upon LRP initialization
[x] cd into parent directory (vit-shapley) only upon first cell execution (otherwise it would keep cd'ing "../")
[x] automatically create result folders for experiments
[x] clean up unused code / cells
[ ] Evaluate Attention Rollout with Residuals (set to False which degrades performance metric-wise)
[ ] Sample Rise masks using generate_mask instead of binomial distribution as done for the surrogate model
Sanity Check
To make sure, refactoring the training pipeline didn't mess up the semantics, i re-ran the experiments on the Pets dataset. Here are the learning curves i got:
To save compute, i refrained from retraining on ImageNette and re-used the weights i got from back then. In both cases, evaluating Insertion, Deletion & Faithfulness did reproduce the results of the paper for vit_base_patch16_224.
Please let me know if you like it or have additional suggestions.
As a token of gratitude for the great work, some updates to help people reading & running the code 🤗.
Below a list of all the changes. In addition to the committed changes, there were some potential changes that i could imagine improving the experimental setup. These are left as unchecked boxes. I separated changes to the training pipeline and the benchmarking notebook into 2 branches, just in case.
Training Pipeline
Benchmarking Notebook
Sanity Check
To make sure, refactoring the training pipeline didn't mess up the semantics, i re-ran the experiments on the Pets dataset. Here are the learning curves i got:
To save compute, i refrained from retraining on ImageNette and re-used the weights i got from back then. In both cases, evaluating Insertion, Deletion & Faithfulness did reproduce the results of the paper for vit_base_patch16_224.
Please let me know if you like it or have additional suggestions.