What does this PR do?

Add a tool to populate the cache of traced models.
Use the same way as export to construct the configs to hash, to avoid hash key difference.

Tests

Test with config file

{
"hf-internal-testing/tiny-stable-diffusion-torch": [
    {   "batch_size": 1, "height": 64, "width": 64, "num_images_per_prompt": 1, "auto_cast": "matmul", "auto_cast_type": "bf16" }
],
"hf-internal-testing/tiny-random-gpt2": [
    {  "batch_size": 1, "sequence_length": 512, "num_cores": 1, "auto_cast_type": "fp16"  }
],
"hf-internal-testing/tiny-random-BertModel": [
    {  "task": "text-classification", "batch_size": 1, "sequence_length": 16, "auto_cast": "matmul", "auto_cast_type": "fp16"  }
]
}

Command:

python tools/auto_fill_inference_cache.py --config_file inference_cache_test_config.json

Test to cache a single model

Encoder

python tools/auto_fill_inference_cache.py --hf_model_id hf-internal-testing/tiny-random-BertModel --task text-classification --batch_size 1 --sequence_length 64 --auto_cast matmul --auto_cast_type bf16

Decoder

python tools/auto_fill_inference_cache.py --hf_model_id hf-internal-testing/tiny-random-gpt2 --batch_size 1 --sequence_length 512 --num_cores 1 --auto_cast_type bf16

Stable Diffusion

python tools/auto_fill_inference_cache.py --hf_model_id hf-internal-testing/tiny-stable-diffusion-torch --batch_size 1 --height 64 --width 64 --auto_cast matmul --auto_cast_type bf16

Next steps

Will update existing configs in aws-neuron/optimum-neuron-cache
Will upload configs for the encoder and stable diffusion models

huggingface / optimum-neuron

Add tools for auto filling traced models cache #537

What does this PR do?

Tests

Next steps