huggingface / transformers.js

State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!
https://huggingface.co/docs/transformers.js
Apache License 2.0
12.25k stars 778 forks source link

Provide file size information in advance to enable single file download progress bar #1052

Open rolf-moz opened 1 week ago

rolf-moz commented 1 week ago

Feature request

Return file size information prior to model file download to enable better UI when downloading multiple model files.

After loading the config files, we fire a single 'session_info' (name subject to change) which contains an object like this:

{
// [file name] => [info]
  encoder_model: {...},
  decoder_model_merged: {...},
}

where each info element contains info like file name, file size, url, etc.

Clients can then use this to do different style of downloading UI (i.e. total percentage downloaded of all model files)

Motivation

As we look to integrate transformers.js with more UI we may want to have a more predictable experience downloading models. Right now the callbacks don't have enough information for a single progress bar downloading the model.

Your contribution

I may be able to help code this.

emojiiii commented 1 week ago

There is already a Progress callback, please refer to pr

rolf-moz commented 1 week ago

I see a progress callback for a file, but typically there are multiple files, and downloads of all files may not initiate immediately. If there is an event that has info on all files (at least once at start) then we will have the ability to display a single bar for the download progress as a whole.

emojiiii commented 1 week ago

When using hub to download huggingface repo, the real-time progress of each downloaded file will be continuously called back

The current download status of each file is recorded in the ProgressInfo type.

eg.

image
ntbrown commented 5 days ago

I see a progress callback for a file, but typically there are multiple files, and downloads of all files may not initiate immediately. If there is an event that has info on all files (at least once at start) then we will have the ability to display a single bar for the download progress as a whole.

I just had to build this myself.

While it's not 100% what you want ... It's close enough (imo)

You can just write conditionals in the callback to keep track of the states (initialized, download started, progress, done) and when you hit initialized / initial download you know you have a new file and need to sum the new total to the overarching progressbar iff you haven't already done so -> on each progress update adjust bars as needed given the byte difference.

For the prior you just track (loaded - lastLoaded) in the conditional for "progress" state to get the new byte difference to add to the overarching sum and / or for that specific file's progressbar.

I was able to build an overarching progressbar + download states for each individual file in the queue with full UI updates. I think it looks and works fairly well. But, each to their own.

That being said. An overarching total is better UI because doing the prior have some very small differentials in the total size creating some jank as new files / time spread rolls in during the process. So, +1 and a data point from someone that just had to do this.

What you can build with the current info with a bit rough styling w/ a rough UI

Screenshot 2024-11-28 184435

Here is a loom - bit janky w/ slow CPU sorry about that: https://www.loom.com/share/68eba0e9f63d49bba2b1fa8d566c24e5?sid=bc2991ad-6336-4b7c-8363-ea7a36af9b36

Overall ... Not bad. Removing jank w/ total size returned and then this is solid w/o any issues.