huggingface / blog

Public repo for HF blog posts
https://hf.co/blog
2.34k stars 726 forks source link

Size of blog repo #480

Closed patrickvonplaten closed 1 year ago

patrickvonplaten commented 2 years ago

The blog repo is now ca. 300MB heavy - should we maybe move the images etc... out into a dataset? What do you think @osanseviero ?

   25.0 MiB [##########] /27_summer_at_huggingface
   24.9 MiB [######### ] /29_streamlit-spaces
   20.1 MiB [########  ] /70_deep_rl_q_part1
   19.1 MiB [#######   ] /73_deep_rl_q_part2
   10.9 MiB [####      ] /28_gradio-spaces
   10.2 MiB [####      ] /78_deep_rl_dqn 
    9.9 MiB [###       ] /34_course_launch
    9.6 MiB [###       ] /63_deep_rl_intro
    8.2 MiB [###       ] /41_perceiver
    7.6 MiB [###       ] /22_gradio
    7.1 MiB [##        ] /85_sentiment_analysis_twitter
    7.0 MiB [##        ] /78_annotated-diffusion
    6.8 MiB [##        ] /37_data-measurements-tool
    6.4 MiB [##        ] /89_deep_rl_a2c
    5.3 MiB [##        ] /18_big_bird 
    5.2 MiB [##        ] /85_policy_gradient
    5.0 MiB [#         ] /93_deep_rl_ppo
    4.7 MiB [#         ] /96_hf_bitsandbytes_integration
    3.6 MiB [#         ] /82_eval_on_the_hub
    3.5 MiB [#         ] /62_pytorch_fsdp
    3.0 MiB [#         ] /35_bert_cpu_scaling_part_2
    3.0 MiB [#         ] /56_fine_tune_segformer
    2.9 MiB [#         ] /01_how-to-train
    2.8 MiB [#         ] /30_clip_rsicd
    2.7 MiB [#         ] /75_hugging_face_endpoints_on_azure
    2.5 MiB [#         ] /39_introducing_snowball_fight
osanseviero commented 2 years ago

Great point! In recent blog post we've been pushing much more for compressed images as well, as large posts will lead to very slow user experience while reading.

I think we can explore moving assets from our blog posts to a dataset and keeping here the community ones, but we should still aim for very light blog posts.

@simoninithomas I see 70_deep_rl_q_part1 73_deep_rl_q_part2 78_deep_rl_dqn 63_deep_rl_intro @merveenoyan same for 29_streamlit-spaces 28_gradio-spaces And myself for the top one

I think independently of moving these assets to a Hub dataset, it would be good to compress the images/thumbnails for a good experience even with slow connections. Maybe we could take care of the corresponding blog posts mentioned above?

simoninithomas commented 2 years ago

Ok, I cloned the repo. I can reduce some stuff, for instance the gifs.

But most of my folders are 20mb because for instance for deep-rl-q-part1 we have 73 files (jpegs no png).

What we could do as Omar mentioned if the size is too big is to put in a dataset 🤔.

✅ Optimized and deleted some illustrations: #489

julien-c commented 1 year ago

closing this as we now require authors to upload their assets to a dataset repo on the hub

BTW, we could do this for thumbnails too in the future, potentially