Closed razonyang closed 4 months ago
RE: build of really large site shows build performance difference from 5ms/page to 17ms/page when I disable/enable single taxonomy
If different theme has the same issues, then it is not theme, but Hugo; I'll try to test; I think it is Hugo architecture. Some other parameters play role too: how many taxonomies to show per page, pagination size, etc.
Right now I don't worry too much: I build locally on laptop (90 minutes), then I use Netlify CLI to upload (3 hours), build at Netlify will fail unless you are on "enterprise" plan.
But even 5ms per page is too slow, in my opinion. My Java application uses less than a minute to parse 250,000 XML files and convert it into Markdown, just to compare, 90 minutes with Hugo to convert 250,000 Markdown files into HTML (with pagination size 1000 and taxonomy & sitemap disabled).
If different theme has the same issues, then it is not theme, but Hugo
It is hard (maybe imposible) to compare and determind, since every theme has it's own more and less functionalities and build optimizations.
But even 5ms per page is too slow
I'd recommend switching to a faster/minimalist (since I didn't see much pictures) theme for your large site at this moment, I don't believe this theme build speed will faster than 5ms/page after tweaking templates.
Btw, what's is the output of hugo mod graph
, just wrote a script to create test content in bulk, try to diagnose which module slow down the speed.
I am running more tests now; I enabled Sitemap generation and it took over 5 hours to generate; previously it was about 90 minutes; I cannot believe my eyes so I am rerunning again all tests with explicitly deleting "public" folder before each run.
Here is "baseline", with sitemaps generation enabled:
| EN
-------------------+---------
Pages | 265764
Paginator pages | 23498
Non-page files | 0
Static files | 8
Processed images | 9
Aliases | 19796
Cleaned | 0
Total in 18275832 ms
Pagination is 1000.
The build speed is much slower than that you posted before, what is the difference between those two build.
It would be helpful if you can provide the source of your site, forget it if there is sensitive information.
Sorry Razon, I have to be very careful with my comments here; I'll rerun everything again, and I will compare with minimalistic themes too.
I analyzed what changed: I uncarefully commented out this list_style: minimalist
in config, maybe this was main reason of 3x slower build.
Great News!!!
After spending several nights tuning its performance, I gained very little. However, when I accidentally opened the public directory, I discovered a serious bug. After testing, the performance was due to the bug generating unnecessary paginated pages, which has now been fixed.
You can compare those two build performance here, in short, now this theme is faster than before 3-5
times, it took 7min
to build 100k
pages (including pagination pages), speed 4-5ms
/page.
It would be much faster on hign-end environment
I just tested on my laptop (32 GiB RAM, 8 cores (16 threads) CPU), it took 10min to build about 200k pages, 3.2ms
/page, I believe the theme may take less than 1hr to build one million normal pages.
I'm looking forward how it perform on your site if you're still want to use this theme.
Hi Razon,
Theme is great, especially for me - since I am developer learning Hugo, it has great features including Node integration (which other "minimalistic" themes don't have good examples); mobile device friendly is super important too, "Responsive", Bootstrap.
I also accidentally noticed weird issue with "file not found" in "hugo_cache" during build, when I wanted to have "content/terms" folder for my dictionary. Some filenames/folder-names probably reserved, error messages are weird, and very hard to troubleshoot.
Here are numbers "before" and "after" upgrade of one of my sites, all settings are the same except modules upgrade:
| EN
-------------------+--------
Pages | 31856
Paginator pages | 3519
Non-page files | 0
Static files | 0
Processed images | 9
Aliases | 12768
Cleaned | 0
Built in 422229 ms
Environment: "development"
After the upgrade:
| EN
-------------------+--------
Pages | 31856
Paginator pages | 1932
Non-page files | 0
Static files | 0
Processed images | 9
Aliases | 12764
Cleaned | 0
Built in 116927 ms
Environment: "development"
Serving pages from disk
I'll try my largest site too, it was around 5 hours build previously, it will take time ;)
I also accidentally noticed weird issue with "file not found" in "hugo_cache" during build...
This is indeed a headache, but we need reproducible steps to locate the cause, there are too many factors (disk, file permission, theme's bug, Hugo bug and so on), however I didn't meet those issue on Windows, WSL and Linux, it's also hard to troubleshot to me.
After the upgrade: ... Built in 116927 ms ...
Seems much better, the build speed has been increased from 10+
ms/page to 3.4ms/page, what I can image the fastest build speed is about to 2-3ms in high-end environment. Since the hooks system (hugopress) brings flexible module hot plugging functionality (install/remove modules without changing themes), but it also comes with some performance losses, so please do not expect it will faster than other optimized themes.
Closed per above comments.
With my super large site, I didn't notice significant improvements; also, my tests were not super clean, I have to avoid working on laptop while running tests.
Before upgrade:
| EN
-------------------+---------
Pages | 265764
Paginator pages | 23498
Non-page files | 0
Static files | 8
Processed images | 9
Aliases | 19796
Cleaned | 0
Total in 18275832 ms
After upgrade:
| EN
-------------------+---------
Pages | 265763
Paginator pages | 6983
Non-page files | 0
Static files | 9
Processed images | 9
Aliases | 19795
Cleaned | 0
Total in 19810338 ms
I also found that when I use list_style: minimalist
build is about 4x faster, and I don't understand why, "pagination" needs "list_style" and in both cases it needs to retrieve title and description from linked pages.
In average, build is 4200 seconds when I just use list_style: minimalist for blogs, and it becomes 20,000 seconds when I comment it out in config:
terms:
# the paginate for categories, tags, series list pages.
paginate: 1000
#list_style: minimalist
profile: false
blog:
#list_style: minimalist
profile: false
I am not sure, maybe "list_style" tries to generate thumbnails or graphics, that's why it is slow; I don't like "minimalist" because t doesn't show description, not good for SEO; I am checking docs now
UPDATE: checking https://github.com/hbstack/blog/blob/main/layouts/partials/hb/modules/blog/post/card.html
I can guess only that taxonomies calculations taking place (instead of cached results); plus, I am unsure how Hugo handles this: in case of smaller "terms.html" it can find "term" by using "full table scan"; but in my case, I have at least 10,000 - 100,000 terms, do they use "index" to scan "terms.html"? am not sure; but it adds 4 hours of build time for 250,000 documents site ;) I am only guessing, I don't know Hugo internals
I repeated test with smaller site, "minimalist":
| EN
-------------------+--------
Pages | 68157
Paginator pages | 1533
Non-page files | 0
Static files | 9
Processed images | 9
Aliases | 8797
Cleaned | 0
Total in **247066 ms**
And non-minimalist, regular card:
| EN
-------------------+--------
Pages | 68157
Paginator pages | 1533
Non-page files | 0
Static files | 9
Processed images | 9
Aliases | 8797
Cleaned | 0
Total in **705852 ms**
So, I downloaded "Card.html" and removed "taxonomy" from code, tested again "regular card":
| EN
-------------------+--------
Pages | 68157
Paginator pages | 1533
Non-page files | 0
Static files | 9
Processed images | 9
Aliases | 8797
Cleaned | 0
Total in **263589 ms**
I am not sure, this is from docs: partials.IncludeCached LAYOUT CONTEXT
<div class="hb-blog-post-meta d-block text-nowrap text-truncate mb-2">
{{ partialCached "hb/modules/blog/post/meta/taxonomies" $page **$page** }}
</div>
Could be an issue? We cache it in a "page" context (second $page
parameter) instead of global "site" context? So that it never cached?
So, I tested it, altered Card.html
line 67, and put .
instead of $page
as second "context" parameter:
| EN
-------------------+--------
Pages | 68157
Paginator pages | 1533
Non-page files | 0
Static files | 9
Processed images | 9
Aliases | 8797
Cleaned | 0
Total in 271600 ms
Did I find the fix? From 700 seconds to 270 seconds by just fixing line 67?
Before:
<div class="hb-blog-post-meta d-block text-nowrap text-truncate mb-2">
{{ partialCached "hb/modules/blog/post/meta/taxonomies" $page $page}}
</div>
Build time: 700 seconds
After:
<div class="hb-blog-post-meta d-block text-nowrap text-truncate mb-2">
{{ partialCached "hb/modules/blog/post/meta/taxonomies" $page . }}
</div>
Build time: 270 seconds
{{ partialCached "hb/modules/blog/post/meta/taxonomies" $page . }}
Haven't read it fully, but it's wrong, the taxonomies is related to current card's page, not current context (not a page). You can use this code start a Hugo server, and check your posts..
minimalist
minimalist
just list title and date.
With my super large site, I didn't notice significant improvements; also, my tests were not super clean, I have to avoid working on laptop while running tests.
Hmm, I'm surprised the build speed was getting slow after upgrading, since I do see build performance got improved on all my sites... It maybe site's spec, such as Network operations (calling APIs, fetch remote data), images processing, custom templates/shortcodes and so on, or there is potential performance issue, but I'm not able to debug this without source code, couldn't provide help on this.
I can guess only that taxonomies calculations taking place (instead of cached results); plus, I am unsure how Hugo handles this: in case of smaller "terms.html" it can find "term" by using "full table scan"
Hmm, I didn't look into Hugo source code, theme uses .GetTerms
page function to get terms, will take a look if have time
You maybe right, the taxonomies may be the cause of this.
I just created a site from scratch without any theme.
// layouts/_default/single.html
{{- $page := . }}
{{ $t := debug.Timer "page-taxonomies" }}
{{- range $kind := slice "tags" "categories" }}
{{ $t1 := printf "page-taxonomies-%s" $kind | debug.Timer }}
{{- with $page.GetTerms $kind }}
{{- range . }}
<span class="blog-post-taxonomy-meta">
<a
class="blog-post-taxonomy blog-post-taxonomy badge bg-secondary text-decoration-none fw-normal me-1"
href="{{ .RelPermalink }}">
{{- .Title -}}
</a>
</span>
{{- end }}
{{- end }}
{{ $t1.Stop }}
{{- end -}}
{{ $t.Stop }}
And then create dummy content with Lorem Ipsum Generator.
lorem-ipsum-generator -n 10000 --tag-count 20 -o content
The script generate 10k posts that contains 20 tags per page.
I used cascade
for filtering some posts to compare performances between them.
// hugo.toml
[[cascade]]
[cascade._target]
path = "/{3001-4000,4001-5000,5001-6000,6001-7000,7001-8000,8001-9000,9001-10000}/**"
[cascade.build]
list = "never"
render = "never"
Posts | Performance |
---|---|
2000 |
|
3000 |
|
5000 |
|
10k |
As the images shown, the average
got increased as the content grows.
The issue seems not related to this theme, will try to create a repo and post a topic on Hugo forum.
After my "patch" applied:
| EN
-------------------+---------
Pages | 265781
Paginator pages | 6983
Non-page files | 0
Static files | 9
Processed images | 9
Aliases | 19804
Cleaned | 0
Total in 6282292 ms
Before:
| EN
-------------------+---------
Pages | 265763
Paginator pages | 6983
Non-page files | 0
Static files | 9
Processed images | 9
Aliases | 19795
Cleaned | 0
Total in 19810338 ms
I believe this is THE issue,
{{ partialCached "hb/modules/blog/post/meta/taxonomies" $page $page}}
Haven't read it fully, but it's wrong, the taxonomies is related to current card's page, not current context (not a page).
Hmm, I think I've explained this, please check your site afrer applying your patch, make sure the taxonomies are correct for each posts.
just for an example.
And all my post's taxonomies disappear.
will be cached in the current page context and this cache won't be available in other pages context for reuse
The taxonomies meta are used on detail page, list pages (sections, tags, categories, archieve and so on), it's not used one time only.
I was wrong trying to use {{ partialCached "hb/modules/blog/post/meta/taxonomies" $page . }}
in Card.html
; it doesn't update "card" with proper taxonomy; it uses static constants.
But anyway, the issue is with taxonomies calculations. Why those are not cached in some "dictionary" file and always being recalculated?
But anyway, the issue is with taxonomies calculations. Why those are not cached in some "dictionary" file and always being recalculated?
Not sure, there may be Hugo's bottleneck for handling a large number of taxonomy terms, as previous test (only one template, no theme) shown, the average excuted time increased apparently as taxonomy terms grows.
Btw, I created a topic on Hugo forum, please wait for they to reply/confirm if there is something I'm doing wrong.
I did test with about 70,000 pages, similar results, 3x difference between "minimalist" (without taxonomy) and regular; 250 seconds "minimalist", 700 seconds "regular".
Yes, better to ask Hugo.
I can imagine hude file containing precalculated taxonomies in (my best hope) alphabetical sorted order, Log(n) (best hope) search algorithm, or even better, separate "index" file; but I feel that "partialCached" doesn't use this. Taxonomies should be precalculated and cached. For smaller sites, it is not visible, like in this example, 250 seconds vs. 700 seconds (and thanks to you Razon this is huge improvement to what it was few days ago!)
I did test with about 70,000 pages, similar results, 3x difference between "minimalist" (without taxonomy) and regular; 250 seconds "minimalist", 700 seconds "regular".
Will check if the cached is used on non-minimalist style.
Also, it is strange that it still takes time: I use taxonomies.count = false in config, it must be instant, no calculations required. It is like outputting just link to taxonomy page vs. outputting link (no need to calculate) and count (needs to be calculated). Maybe it should be sorted by count, then I can understand... but in the Card, probably better to sort alphabetically.
taxonomies.count
What is this parameter used for? I couldn't recall.
in the Card, probably better to sort alphabetically.
It's order is same as front matter.
taxonomies:
count: false # whether to show the number of posts associated to the item.
limit: 100 # the maximum number of the item.
I did test with about 70,000 pages, similar results, 3x difference between "minimalist" (without taxonomy) and regular; 250 seconds "minimalist", 700 seconds "regular".
The caches is correct, you can see there is 100% cached, the speed difference between the two is due to the simplicity of minimalist, which only displays the title and date, you can override the layouts/partials/hb/modules/blog/posts-minimalist.html to suit your needs and gain better performance.
Also, it is strange that it still takes time: I use taxonomies.count = false in config
This won't affect performance,the sidebar's taxonomies was cached, beside the time it takes up is almost negligible (just 150ms
on 30k pages site with debug mode).
See also https://discourse.gohugo.io/t/getterms-getting-slows-as-the-content-grows/50332/2?u=razon, the time of page's taxonomies is linear increase.
Currently, you can tweak posts-minimalist
template for gainning better performance.
Hugo team have submited some improvements, I'm not sure if it's helpful in your cases, you can build from source to confirm it.
git clone https://github.com/gohugoio/hugo
cd hugo
go get
go build -tags extended
cd /path/to/your-site
/path/to/gohugoio/hugo/hugo
Build several times and take average to compare with previous build.
Thank you Razon, I appreciate it very much, trying it now...
Performance definitely improved, approx. 4x times!!! I even enabled "tags" taxonomy (it was disabled before), what was taking 5 hour, now takes approx. 1:20
| EN
-------------------+---------
Pages | 265669
Paginator pages | 6980
Non-page files | 0
Static files | 9
Processed images | 9
Aliases | 19810
Cleaned | 0
Total in 4608877 ms
Performance definitely improved, approx. 4x times!!! I even enabled "tags" taxonomy (it was disabled before), what was taking 5 hour, now takes approx. 1:20
Was minimalist
enabled? If it wasn't enabled, that is really impressive, then I'll close the forum topic later.
BTW, how much taxonomies (tags + categories) do you have on this site?
I made mistake; my "tags" had only cardinality 4, very few documents with "tags"; "category" maybe 300-1000; I tried to run with "category x keywords" double-taxonomy, my estimate is 1000 x 10,000 cardinalities, it is already 12+ hours still running. But just "category" taxonomy works 4x faster now.
I tested and compared the performance between of v0.127.0 and next version of Hugo, as a result, the GetTerms
has a marked improvement.
If your site still have performance issue, you may need to provide a site source or reproducible repo for me to debug and locate the cause, it's very hard to debug via guessing.
My site doesn't have any specific to my site issues; it is generic Hugo design issue. I'll try to use "Generator" tool to reproduce in separate repo; it is not theme related.