huggingface / huggingface_hub

The official Python client for the Huggingface Hub.
https://huggingface.co/docs/huggingface_hub
Apache License 2.0
1.98k stars 514 forks source link

Is there a way to extract a model's download stats (e.g., last 30 days) in times series format? #2390

Open frr717 opened 2 months ago

frr717 commented 2 months ago

Is your feature request related to a problem? Please describe. Currently, I can only find code like below to get a static data point (last 30 day download count from today): info = model_info("bert-base-uncased") model_info(info.modelId).downloads

Describe the solution you'd like I wonder whether huggingface can provide methods with an input specifying the date? such as
model_info(info.modelId).get_downloads('20240131')

Describe alternatives you've considered currently no... I appreciate any help from all of you!

Wauplin commented 2 months ago

Hi @frr717, thanks for your interest. There is currently no way to get this data as time series. The only information you can get is the downloads in the last 30 days and overall downloads. What would be your use case for a timeseries format?

frr717 commented 2 months ago

Hi @frr717, thanks for your interest. There is currently no way to get this data as time series. The only information you can get is the downloads in the last 30 days and overall downloads. What would be your use case for a timeseries format?

Thank you for you reply.

I am in a research project that needs to use this times series to conduct some regression analysis on companies that those models belong to. Hence I am interested to know whether your team has a plan to implement it? Thanks!

frr717 commented 2 months ago

Hi @frr717, thanks for your interest. There is currently no way to get this data as time series. The only information you can get is the downloads in the last 30 days and overall downloads. What would be your use case for a timeseries format?

BTW, I want to kindly ask you another questions regarding the image (a SVG element in the html of a model, such as [https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5] on the right top corner, besides the "Downloads last month": Snipaste_2024-07-15_18-16-22 What is the frequency of the data points in the image? Take this model on the above link as an example: it was created on 2024-05-19. So, does this mean that the line in the SVG represents each DAY's last-30-day downloads since its creation time? Thank you!

julien-c commented 2 months ago

it's each day in the last 30 days

frr717 commented 2 months ago

it's each day in the last 30 days

thank you!

frr717 commented 1 month ago

it's each day in the last 30 days这是过去 30 天内的每一天

hi, @julien-c The data points on this image has been compressed to the range [0,100].

Could you kindly tell me the formula it uses?

Thank you!

julien-c commented 1 month ago

0 means 0 download, ie. we don't move the origin.

So yes, you can get the daily downloads from the last30days total + the graph. It's a bit hacky but it'll work.