Closed JingyaHuang closed 6 months ago
Test with config file
{ "hf-internal-testing/tiny-stable-diffusion-torch": [ { "batch_size": 1, "height": 64, "width": 64, "num_images_per_prompt": 1, "auto_cast": "matmul", "auto_cast_type": "bf16" } ], "hf-internal-testing/tiny-random-gpt2": [ { "batch_size": 1, "sequence_length": 512, "num_cores": 1, "auto_cast_type": "fp16" } ], "hf-internal-testing/tiny-random-BertModel": [ { "task": "text-classification", "batch_size": 1, "sequence_length": 16, "auto_cast": "matmul", "auto_cast_type": "fp16" } ] }
Command:
python tools/auto_fill_inference_cache.py --config_file inference_cache_test_config.json
Test to cache a single model
python tools/auto_fill_inference_cache.py --hf_model_id hf-internal-testing/tiny-random-BertModel --task text-classification --batch_size 1 --sequence_length 64 --auto_cast matmul --auto_cast_type bf16
python tools/auto_fill_inference_cache.py --hf_model_id hf-internal-testing/tiny-random-gpt2 --batch_size 1 --sequence_length 512 --num_cores 1 --auto_cast_type bf16
python tools/auto_fill_inference_cache.py --hf_model_id hf-internal-testing/tiny-stable-diffusion-torch --batch_size 1 --height 64 --width 64 --auto_cast matmul --auto_cast_type bf16
aws-neuron/optimum-neuron-cache
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
What does this PR do?
Tests
Test with config file
Command:
Test to cache a single model
Next steps
aws-neuron/optimum-neuron-cache