Closed zjgarvey closed 1 month ago
When running python run.py <other-args> --get-metadata, this will save a dictionary with the model size and op frequencies to the log directory.
python run.py <other-args> --get-metadata
After a run, you can use python utils/find_duplicate_models.py to save or print a json dump of redundant models.
python utils/find_duplicate_models.py
run.py
I saved the tests below to a file called sample.txt.
sample.txt
add_test model--bart-base-booksum--KamilAin model--bart-base-cnn--ainize model--bart-base-few-shot-k-1024-finetuned-squad-seed-2--anas-awadalla model--bart-base-few-shot-k-1024-finetuned-squad-seed-4--anas-awadalla
With a clean test-run directory, I ran
test-run
python run.py --testsfile=sample.txt --stages "setup" --get-metadata
The result of running
python utils/find_duplicate_models.py -s
was:
[ [ "model--bart-base-booksum--KamilAin", "model--bart-base-cnn--ainize" ], [ "model--bart-base-few-shot-k-1024-finetuned-squad-seed-4--anas-awadalla", "model--bart-base-few-shot-k-1024-finetuned-squad-seed-2--anas-awadalla" ] ]
and without the -s arg, it includes the metadata for each grouping:
-s
[ { "models": [ "model--bart-base-booksum--KamilAin", "model--bart-base-cnn--ainize" ], "shared_metadata": { "model_size": 712772272, "op_frequency": { "Add": 227, "Cast": 13, "Concat": 188, "Constant": 886, "ConstantOfShape": 6, "Div": 44, "Equal": 5, "Erf": 12, "Expand": 5, "Gather": 64, "Less": 1, "MatMul": 133, "Mul": 99, "Pow": 32, "Range": 3, "ReduceMean": 64, "Reshape": 187, "Shape": 67, "Slice": 2, "Softmax": 18, "Sqrt": 32, "Squeeze": 2, "Sub": 35, "Transpose": 90, "Unsqueeze": 325, "Where": 8 } } }, { "models": [ "model--bart-base-few-shot-k-1024-finetuned-squad-seed-4--anas-awadalla", "model--bart-base-few-shot-k-1024-finetuned-squad-seed-2--anas-awadalla" ], "shared_metadata": { "model_size": 558176646, "op_frequency": { "Add": 229, "Cast": 17, "Concat": 193, "Constant": 937, "ConstantOfShape": 12, "Div": 44, "Equal": 10, "Erf": 12, "Expand": 11, "Gather": 70, "Less": 1, "MatMul": 133, "Mul": 103, "Pow": 32, "Range": 6, "ReduceMean": 64, "Reshape": 191, "ScatterND": 2, "Shape": 83, "Slice": 7, "Softmax": 18, "Split": 1, "Sqrt": 32, "Squeeze": 4, "Sub": 35, "Transpose": 90, "Unsqueeze": 333, "Where": 13 } } } ]
Thanks
Usage:
When running
python run.py <other-args> --get-metadata
, this will save a dictionary with the model size and op frequencies to the log directory.After a run, you can use
python utils/find_duplicate_models.py
to save or print a json dump of redundant models.Options:
run.py
was run with a non-default run directory arg.Sample:
I saved the tests below to a file called
sample.txt
.With a clean
test-run
directory, I ranThe result of running
was:
and without the
-s
arg, it includes the metadata for each grouping: