Open mnlbox opened 7 years ago
hexo clean
only when the style or the theme is changed which may affect all the pages. The old posts are stored in database, if there is no global change the generating process will skip them.@NoahDragon Thanks for your reply. I create simple theme based on bootstrap without any widgets. It's extra simple and just use some property from app _config (I put some global config in app _config rather than theme config) I don't know is it right or not? I used many front-matter in my markdown page. (Category, Tags and two other custom attribute) Each post has between 2 or 10 tags and just one category My renderer engine is https://github.com/hexojs/hexo-renderer-marked
Can you make me some other suggestion based on this new information about my app?
I have been through the same problem. In the end, we decided to get rid of some hexo-plugins like.
hexo-multiauthors, hexo-tag-generator, hexo-archives.
And it's not about the RAM, it has more to do with your CPU's. Hexo Generate is a CPU intensive task.
@NoahDragon
If we could utilize multi-cores(if it doesn't) then maybe we can reduce the pain a bit.
@rahil471 Yes I checked my system monitor and it seems Hexo only use 1 CPU core. I have 4 core and one of my core increased until 100% but others between 2% to 19%.
@mnlbox It's okay to put all configuration into the app _config.yml file, the theme configuration will fall back to the app when it is not set. As the #550 states, the categories/tags may dramatically slow down the rendering process. So far, I don't have better suggests on that unless we improve the Hexo performance.
@rahil471 I believe @tommy351 has tried the multi-cores approach but I don't know why he didn't continue, maybe creating multiple Hexo rendering instances may increase the complexity and maintenance efforts. I think it is a good approach, and may re-think about it.
Maybe related: #2164
@NoahDragon I can build my site with Hugo in just 25 second. (with hexo my build not finished after 5 hour) 25 second for more than 10000 posts is awesome. What is the reason of this biiiiig difference?
@mnlbox Thanks for the info. I will take a look into Hugo, I'm assuming it uses multiple processors and the performance between javascript and go also impacts.
@mnlbox did you tried the default theme without any plugin? (init a new site and copy your posts over) Inefficiency of theme/widget may be the culprit.
See discussion starting from here: https://github.com/hexojs/hexo/issues/1769#issuecomment-189313985
@leesei Yes I also try default theme and also try remove unused plugins but it's not different for this issue.
I have tested a site with 2000 posts, and found the following code was executed over 10k times, taking up 20% of the execution time of hexo g
One of the reasons is that queries (find
in line 38 and 40) are O(n), thus the time consumption is terrible when dealing with a large number of posts
Update: with 8000 posts, list_tags
is taking 47% of the execution time (21m 50s in total)
Update: I tried to disable external_link
and optimize list_tags
, the generating time reduced to ~5.25min
CC @hexojs/core
I have a site with over 1400 posts and almost 10K assets. It takes HEXO over 30 minutes to generate if I don't run hexo clean
first. However, running hexo clean && hexo gen
will generate the site within a minute. This tells me the issue resides with using the db.json file. That db is of no value to me as I only need my website generated and pushed. I don't need it after the site is generated.
One of the reasons is that queries (
find
in line 38 and 40) are O(n), thus the time consumption is terrible when dealing with a large number of posts
@stevenjoezhang
Tag.virtual().get()
defines a getter
, and the getter
function here will be executed every time the property is accessed. So before we try to optimize the find
, is it possible for Hexo to reduce the access to tag.posts
with cache?
I have a site with over 1400 posts and almost 10K assets. It takes HEXO over 30 minutes to generate if I don't run hexo clean first. However, running hexo clean && hexo gen will generate the site within a minute. This tells me the issue resides with using the db.json file. That db is of no value to me as I only need my website generated and pushed. I don't need it after the site is generated.
I've been deleting db.json
before running hexo g
and hexo s
for quite some time. Today I took the opportunity to capture the framegraph without deleting db.json
. (See https://github.com/hexojs/hexo/issues/5456#issuecomment-2053873863)
My environment has 1800 .md
files and 1500 image files (jpg, png), and db.json
size is 37MB.
As a result, it seems that toObject in the warehouse is taking time.
As a result, it seems that toObject in the warehouse is taking time.
So the culprit is the cloneDeep
.
But here is a thing: JSON doesn't support circular reference, hence why the warehouse uses cloneDeep
. We can accomplish that by using a JSON-like format that supports circular reference.
I have found some JSON libraries that handle circular references very well, such as flatter. However, switching the implementation of warehouse over to this would require a significant amount of manpower.
In my own experiments, cloneDeep
was not invoked. I'm curious under what circumstances / configurations this would be an issue?
In my own experiments, cloneDeep was not invoked. I'm curious under what circumstances / configurations this would be an issue?
Below is my environment. I didn't include _config.yml
in this comment. Because it's lengthy. Is it necessary? If you need any additional information, please let me know. I'll provide it if possible.
$ hexo -v
hexo: 7.1.1
hexo-cli: 4.3.1
os: win32 10.0.22631
node: 20.11.1
...
v8: 11.3.244.8-node.17
# OS
Microsoft Windows [Version 10.0.22631.3447]
# Cpu
AMD Ryzen 7 PRO 4750G with Radeon Graphics
# Memory
Capacity Name Tag
17179869184 Physical Memory Physical Memory 1
17179869184 Physical Memory Physical Memory 3
// package.json
"dependencies": {
"hexo": "7.1.1",
"hexo-filter-nofollow": "2.0.2",
"hexo-generator-archive": "git+https://github.com/hexojs/hexo-generator-archive.git#master",
"hexo-generator-category": "git+https://github.com/hexojs/hexo-generator-category.git#master",
"hexo-generator-feed": "git+https://github.com/yoshinorin/_hexo-generator-feed.git#master",
"hexo-generator-index": "git+https://github.com/hexojs/hexo-generator-index.git#master",
"hexo-generator-sitemap": "git+https://github.com/yoshinorin/_hexo-generator-sitemap.git#master",
"hexo-generator-tag": "git+https://github.com/hexojs/hexo-generator-tag.git#master",
"hexo-html-minifier": "git+https://github.com/hexojs/hexo-html-minifier.git#master",
"hexo-pagination": "git+https://github.com/yoshinorin/hexo-pagination.git#my-site",
"hexo-renderer-ejs": "git+https://github.com/hexojs/hexo-renderer-ejs.git#master",
"hexo-renderer-markdown-it": "git+https://github.com/hexojs/hexo-renderer-markdown-it#master",
"hexo-server": "git+https://github.com/hexojs/hexo-server.git#master"
}
I'm using a theme that I've delete many features from https://github.com/LouisBarranqueiro/hexo-theme-tranquilpeak.
$ dir
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 2024/04/16 20:21 37507929 db.json
Please see Appendix
section how to get these.
Number of posts: 1773
Number of post assets: 1784
Avg of post content length: 3645
Number of pages: 23
Number of page assets: 81
Avg of page content length: 4217
Number of tags: 246
Number of categories: 170
Number of routes: 5335
const Hexo = require('hexo');
const hexo = new Hexo(process.cwd(), {silent: false});
hexo.init().then(() => {
hexo.load().then(() => {
const posts = hexo.locals.get('posts').toArray();
const postAsset = hexo.model('PostAsset');
let numOfPostAssets = 0;
let postContentTotalLen = 0;
for(let post of posts) {
const dir = post.path.slice(0, post.path.lastIndexOf("/"));
const assets = postAsset.filter(x => x._id.includes(dir));
numOfPostAssets = numOfPostAssets + assets.length;
postContentTotalLen = postContentTotalLen + post.content.length;
}
const pages = hexo.locals.get('pages').toArray();
const pageAsset = hexo.model('Asset');
let numOfPageAssets = 0;
let pageContentTotalLen = 0;
for(let page of pages) {
const dir = page.path.slice(0, page.path.lastIndexOf("/"));
const assets = pageAsset.filter(x => x._id.includes(dir));
numOfPageAssets = numOfPageAssets + assets.length;
pageContentTotalLen = pageContentTotalLen + page.content.length;
}
const tags = hexo.locals.get('tags').toArray();
const categories = hexo.locals.get('categories').toArray();
const routes = hexo.route.list();
console.log(`Number of posts: ${posts.length}`);
console.log(`Number of post assets: ${numOfPostAssets}`);
console.log(`Avg of post content length: ${Math.floor(postContentTotalLen / posts.length)}`);
console.log(`Number of pages: ${pages.length}`);
console.log(`Number of page assets: ${numOfPageAssets}`);
console.log(`Avg of page content length: ${Math.floor(pageContentTotalLen / pages.length)}`);
console.log(`Number of tags: ${tags.length}`);
console.log(`Number of categories: ${categories.length}`);
console.log(`Number of routes: ${routes.length}`);
});
});
$ 0x -D framegraph\\with-dbjson .\\node_modules\\hexo\\bin\\hexo g
In my own experiments, cloneDeep was not invoked. I'm curious under what circumstances / configurations this would be an issue?
caused by toArray()
. This function exec when post_asset_folder
option is enabled.
https://github.com/hexojs/hexo/blob/90b107c8b0bd8b386f64463403391f67eb1b2535/lib/plugins/processor/post.ts#L282
Hi guys, I have a big list of posts that I converted to markdown files from my previous web site. My post count in
_posts
directory now is near 10000 posts. I tryhexo server --debug
but it's not running after 15 minutes processing on a machine with 32 gig RAM and a SSD hard drive. I also tryhexo clean && hexo generate
but it's also not finish after 25 minutes. Now my question are: