Closed parsifal9 closed 1 year ago
@tleemann
Thanks for that link. I have read the paper the associated paper with great interest. However, it does not answer the somewhat simpler question that I am asking.
I have made some progress. For an input like
python attributions.py --globalbenchmark --numruns 1 --model_name TabNet
a file "global_benchmarkNone.json" is created and I get (using R)
myData <- fromJSON(file="output/TabNet/Adult/global_benchmarkNone.json")
myData[[1]]
#[1] "TabNet" "TabNet" "TabNet" "TabNet" "TabNet"
myData[[2]]
#[1] "MoRF" "MoRF" "MoRF" "MoRF" "LeRF"
plot(myData[[3]][[1]]) #this looks like the MoRF
lines(myData[[3]][[2]]) #this looks like the MoRF
lines(myData[[3]][[3]])
lines(myData[[3]][[4]])
lines(myData[[3]][[5]],col="red") #this looks like the LeRF curve
1) why 5 sets of results -- 4 MoRF and 1 LeRF 2) the numbers do not seem to match the values printed to the screen, which gives accuracy (0.852, 0.8566,.. ) with 14, 13,.. features i.e. higher than the values in global_benchmarkNone.json 3) I still haven't figured out attributions.py. Presumably it returns the attributions, but where are they?
Bye R
Hi,
when you run the attribution.py script, attributions for the Adult Dataset (currently only this dataset is supported, but it should be easy to extend it to other data) are computed with the model (using the attention maps or the attribution function supplied by the model). strategy=None
boils down to the default strategy for the corresponding model. Then a file at output/<modeltype>/<dataset>/attributions<strategy>.json
is created with the keys model
, strategy
dataset
, attributions
, where attributions is a (N,D) (N=number of samples, D=number of features)-matrix of attributions.
If you additionally pass the --globalbenchmark
option, the MostRelevantFirst (MoRF) and LeastRelevantFirst (LeRF) feature removal tests are run. The output is stored in another file output/<modeltype>/<dataset>/global_benchmark<strategy>.json
, with keys model
, order
, accuracies
, where accuracies contains the accuracies of the model with the feature successively removed. Normally, --numruns
runs for MoRF and LeRF are executed.
Note: The output files are not overwritten, when you restart the code. Instead, the results of the different runs of the script are concatenated to a list in the json file. Therefore, if you run and abort (or the script crashed) the script 3 times (after MoRF was written), you will maybe only see the MoRF results from these runs in the file. If it runs until the end for the fourth time, it will append both a MoRF and a LeRF run to the file. Just make sure to delete the file before you start the script, if you do not wish this behavior. The implementation of the logging procedure can be found in attributions.py
, and utils/io_utils.py
, please take a look there for full details.
Hope this helps, Tobias
Hi Kathrin,
what is the format of the information in attributionsNone.json?
and then in R
Bye R