huggingface / evaluate

🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
https://huggingface.co/docs/evaluate
Apache License 2.0
2.04k stars 260 forks source link

Loading metrics that are shipped with the evaluate package is slow #511

Open harmbuisman opened 1 year ago

harmbuisman commented 1 year ago

Loading metrics that are shipped with the evaluate package takes way too long to load, up to or more than a second whereas I expect it to be near instant.

Repro: Run the following in a jupyter notebook: import evaluate

%%prun
evaluate.load("accuracy")

This outputs the following, suggesting that even for this metric that is available in the package itself it sets up all kinds of communication with the HF hub:

         22184 function calls (21809 primitive calls) in 1.351 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    0.643    0.322    0.643    0.322 {method 'load_verify_locations' of '_ssl._SSLContext' objects}
        2    0.240    0.120    0.240    0.120 {method 'read' of '_ssl._SSLSocket' objects}
        2    0.223    0.112    0.223    0.112 {method 'do_handshake' of '_ssl._SSLSocket' objects}
        2    0.145    0.073    0.145    0.073 {method 'connect' of '_socket.socket' objects}
        2    0.038    0.019    0.038    0.019 {built-in method _socket.getaddrinfo}
       80    0.009    0.000    0.009    0.000 {built-in method nt.stat}
       45    0.006    0.000    0.006    0.000 {built-in method __new__ of type object at 0x00007FF81939AD50}
lvwerra commented 1 year ago

Yes, the metrics are loaded from the Hub, which is why you are observing that it takes 1-2 seconds to load, but in follow-up loading they should be cached.

harmbuisman commented 1 year ago

The evaluate.load("accuracy") loads the sklearn wrapper that is shipped with the package, so it should not go to the hub, See the location within the package: https://github.com/huggingface/evaluate/blob/main/metrics/accuracy/accuracy.py

It takes 1-2 seconds every call to evaluate.load, so no speed improvements on a follow-up call.