pnp / blog

holds all blogs published on the Microsoft 365 Platform Community blog
https://pnp.github.io/blog
MIT License
64 stars 98 forks source link

Optimize build speed #144

Closed appieschot closed 1 year ago

appieschot commented 2 years ago

Our current implementation generates a Small, Medium and Large image size for each image in our blog. We should drop one of these sizes to improve the total blog size (1.7gb current release).

image

appieschot commented 2 years ago
appieschot commented 2 years ago

Might not be needed if we use a dept of 1 to do a shallow clone (thanks to a tip from @waldekmastykarz). Will investigate after the holidays

LuiseFreese commented 1 year ago

Hey @appieschot you still willing to help on this one? Build times are painfully long 😭

appieschot commented 1 year ago

We have two things running:

In both those workflows different things happen:

Once those steps are done, we start the actual deployment if a PR is approved

I would opt to fix the pptx, or convert all PPTX to PDF to drastically reduce the size. Happy to hear other input! 🦾

LuiseFreese commented 1 year ago

hey there @appieschot , first - THANKS for your work - you rock 🎸

Sharing the PPTX via OneDrive/SharePoint would be the last resort I think

appieschot commented 1 year ago

@LuiseFreese ill have a look if we can automate the pptx to pdf with some form of script; reducing image size feels way more effort as we would need to go to the master slide and remove un-used master slides as well. Printing to PDF most likely is quicker. Will report back (and updated initial issue with a checklist to keep track of status)

appieschot commented 1 year ago

@LuiseFreese started working on resizing some of the largers pptx but they are not referenced in the content so far... Any other place they might be linked from?

LuiseFreese commented 1 year ago

How will we proceed with this one? I can instruct Andrew Benson (who writes the community call posts) to save the PPTX as PDF and upload that.

appieschot commented 1 year ago

Sounds like a great plan ;-). I'll do a few more PR's to do the swap out the other PPT's as well and then start with the images. Lets see if we can do a few iterations to improve!

appieschot commented 1 year ago

image

Resizing does bring it down a bit ;-). Not there yet but the first step is taken 🦾

appieschot commented 1 year ago

Ran a version with the --templateMetrics online (image in the initial issue), but results online differ quite a bit:

image

Will investigate if we can speed up some of those things by stripping some stuff.

andrewconnell commented 1 year ago

Hey @appieschot, @LuiseFreese pinged me and asked if I wanted to assist. I've been doing a bunch of work on my Hugo sites lately & heavily leveraging the image processing stuff (which negatively impacted build times, but not too bad).

Interested in another set of eyes?

appieschot commented 1 year ago

@andrewconnell would love to get some input. Agree that images are quite slow so a few observations I made so far:

So more than happy to learn what you think would be the best plan of attack :)

andrewconnell commented 1 year ago

Cool... couple of questions:

Cards on the table: I have one Hugo site that has ~2k content pages & makes heavy use of Hugo's image processing. I favor a smaller repo over longer build times because my build times for deployment aren't a concern (build+deploy = ~16 minutes)... locally it takes MUCH less.

appieschot commented 1 year ago
  • Is the goal to (a) optimize for build speed (b) at the expense of making the repo bigger?

I favor a smaller repo as the current upload of 3gb is quite long, but would assume that we could cache the results of the images to prevent the build process issues we have now; Hugo just stops working in the latest version after like 12m

  • Are there any options for externally hosting some larger assets that don't need to be processed, like a slide deck or ZIP?

For me those are on the table; we recently saved all >10mb slide decks to pdf to save over 1gb of repo storage. If we would offload some of that content we need to figure out where and involve some other ppl to get that working but I am all for that.

  • What do you mean by images breaking? Two of my sites are on Hugo v0.104.x, both use the image processing capability and are working fine (making sure images in the content area are no wider than Xpx & dynamically generating the open graph images). Got a pointer to something specific?

We are using version 0.100.2, anything after that is currently not building with timeouts. See some thread here: https://github.com/pnp/blog/pull/424. I basicaly gave up figuring out why the latest hugo version would break. My feeling is that is has to do with _default/single.html and the section selection we do but I have not investigated further.

Cards on the table: I have one Hugo site that has ~2k content pages & makes heavy use of Hugo's image processing. I favor a smaller repo over longer build times because my build times for deployment aren't a concern (build+deploy = ~16 minutes)... locally it takes MUCH less.

Agree with this, but given the 30 to 45 minutes we have seen with the current release pipeline we would love to bring that down a bit. Since we have around the same amount of content pages in the blog I would be happy if we can hit those 16 minutes ish :)

andrewconnell commented 1 year ago

Cool... thanks for the detailed response. Lemme take a few stabs and see if I can't figure something out.

andrewconnell commented 1 year ago

How much flexibility do I/we have in making changes? Because... the first thing I looked at are huge files in the repo...

... and...

Not only do most of the archetypes contain an 8MB together-mode.gif sample animation, but those animated GIFs, which some are HUGE (3 over 20MB, 6 over 10MB), account for over 815MB in the repo.

... and every single one is processed using Hugo's image processing.

There are over 185 of these files in the repo, NOT including those that have been cached in the image processing. My question: are these really necessary? They are clearly having a significant impact on the build time.

I haven't gone through the trouble seeing what kind of an impact it will have, because it's going to require making a lot of edits to remove those images from content pages so the builds don't throw errors... but if we're open to purging these, I can see what kind of an impact it would have on the repo.

Personally, I can't see the value in keeping these around. I get using them in social media sharing, but in the content page?

Thoughts?

appieschot commented 1 year ago

I won't mind purging them, @LuiseFreese how do you feel?

LuiseFreese commented 1 year ago

Personally, I'd go for this compromise:

  1. Educate Andrew B. - who publishes the community call blog posts on how to reduce the size of a gif
  2. Keep the gif for 2 weeks, then replace by a link to the SoMe post (I can easily do that as a recurring clean up task)

This means that we would have WAY less gifs and GB to process

WDYT?

Luise Freese Microsoft 365 & Power Platform Consultant Dual Microsoft MVP M365 Development & Business Applications

based in Germany: Uerdinger Str. 26 | 40474 Düsseldorf but always traveling ✈✈✈


From: Albert-Jan Schot @.> Sent: Wednesday, September 28, 2022 9:19:21 PM To: pnp/blog @.> Cc: Luise Freese @.>; Mention @.> Subject: Re: [pnp/blog] Optimize build speed (Issue #144)

I won't mind purging them, @LuiseFreesehttps://github.com/LuiseFreese how do you feel?

— Reply to this email directly, view it on GitHubhttps://github.com/pnp/blog/issues/144#issuecomment-1261363983, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AL5FMITFQLQB4R5OWYWEJM3WASK3TANCNFSM5U6DW54A. You are receiving this because you were mentioned.Message ID: @.***>

andrewconnell commented 1 year ago

Correction... it does NOT look like those files are being image processed... they're just contributing to the huge size of the repo. But it's not just the GIFs... there are a lot of large files. The last PPTX was 86MB. Moving these large files will impact the build times but not as much as I was implying.

@LuiseFreese said:

  1. Educate Andrew B. - who publishes the community call blog posts on how to reduce the size of a gif
  2. Keep the gif for 2 weeks, then replace by a link to the SoMe post (I can easily do that as a recurring clean up task)

If you're trying to reduce the build times AND the size of the repo, this won't help. You're adding a large file to the repo, then removing it, but it's still in history. So, anyone who clones the repo, unless if they do a shallow clone, will get the history including those large files.

When I cloned the repo, it was 6GB... which is quite large for a content site.

The bigger issue with these animated GIF's is they're hurting SEO for the page. Google penalizes these pages for a bad mobile practice because it's impossible for a user to avoid downloading & rendering an animated GIF. That's unlike the PPTX files which the user must manually download.

So this really comes down to two things... what's more important / priority on the site?

  1. SEO / reach of the page
  2. build time for the site
  3. the optimized size of the repo

IMHO, it's that order order. So... that means removing these 100% from the site (because the value of those pages is in the first period after publication, 2wks as you say @LuiseFreese). That makes the other two priorities moot.

IMHO, these should only be for social media marketing of the meeting. At most, include just a still frame, not an animated GIF, of the together mode.

LuiseFreese commented 1 year ago

Had a chat with @VesaJuvonen ...he agrees that we should optimize, but wonders, if there is really no way to reference the raw gifs and exclude them?

If there is no way, we will add a still image of the togethermode into the blog and add a link to the twitter/Linkedin post that shows the animated gif.

andrewconnell commented 1 year ago

@LuiseFreese said:

if there is really no way to reference the raw gifs and exclude them?

Possibly with some CSS trickery, you could keep the image from being loaded on mobile, but when Google indexes the page, it will see the gif and add strikes against the page. I use a service that crawls my sites weekly. I used to use animated GIF's for a few demos, until I realized how much of a negative impact it was having on the SEO of the page.

So... referencing my previous comment, WRT (1), I think these animated GIF's are more negative than positive on the site.

And that doesn't even factor in the issues WRT (2) & (3)... from the recent builds, just the deployment of the rendered site content takes 5m... so slashing these big files down will have a substantial impact. In the worst case, it's going to cut 20% of the total deployment size.

FWIW - the way I deal with this on my content sites is to put big files (ZIP's, animated GIF's, etc.) in a separate location (https://cdn.[andrewconnell.com|voitanos.io]) which is an Azure storage blob fronted by an Azure CDN with a custom domain. If I want to show a demo using an animated GIF, I have a sill frame and link to a popup that shows the animation. But this way, the user has to request the animation with manual action (addressing point (1)) and it's served up NOT from my site content & stored in my codebase, rather it's externalized (addressing points (2) & (3)).

I understand this thread to say "remove them", correct? I'll proceed with that & optimizing a lot more of the big files.

LuiseFreese commented 1 year ago

I understand this thread to say "remove them", correct? I'll proceed with that & optimizing a lot more of the big files.

yes.