Is your feature request related to a problem? Please describe.
Popular Spaces have long queues but it's often a person running just one of the examples. If this computation could be cached, that would speed up inference and reduce queue lines.
Specifically:
Can we create a parameter cache in the launch() method that:
caches the predictions of example inputs so that they do not have to be rerun
perhaps caches the predictions of any input generally so that it does have to be run
displays "CACHED" in the place where the prediction time is shown in the UI
does so in a way that doesn't mess up average runtimes
that is False by default because I think this behavior is not expected and may not work appropriately if state is involved
that is True by default in "state-less" Hugging Face Spaces through the use of an appropriately defined environmental variable
skips the queue if there are uncached samples that are still running -- otherwise, this substantially reduces the impact. If the user still has to wait a long time only to get a cached result
Is your feature request related to a problem? Please describe. Popular Spaces have long queues but it's often a person running just one of the examples. If this computation could be cached, that would speed up inference and reduce queue lines.
Specifically: Can we create a parameter
cache
in thelaunch()
method that:False
by default because I think this behavior is not expected and may not work appropriately if state is involvedTrue
by default in "state-less" Hugging Face Spaces through the use of an appropriately defined environmental variable