huggingface / optimum-nvidia

Apache License 2.0
887 stars 86 forks source link

Avoid writting engines in `.cache/huggingface/hub` #100

Open fxmarty opened 6 months ago

fxmarty commented 6 months ago

It should either be done in tmp or in .cache/huggingface/assets/ like https://github.com/AutoGPTQ/AutoGPTQ/blob/ea4a99778f90b60c9b5177d7487af1b4ca87744f/auto_gptq/utils/marlin_utils.py#L98-L101