hexojs / hexo

A fast, simple & powerful blog framework, powered by Node.js.
https://hexo.io
MIT License
39.57k stars 4.86k forks source link

[Question] Use Hexo for generate more than 10000 posts #2579

Open mnlbox opened 7 years ago

mnlbox commented 7 years ago

Hi guys, I have a big list of posts that I converted to markdown files from my previous web site. My post count in _posts directory now is near 10000 posts. I try hexo server --debug but it's not running after 15 minutes processing on a machine with 32 gig RAM and a SSD hard drive. I also try hexo clean && hexo generate but it's also not finish after 25 minutes. Now my question are:

  1. Is it normal?
  2. How we can speed up Hexo process?
  3. Can we use Hexo for this scale site or It's for smaller sites?
NoahDragon commented 7 years ago
  1. Yes, it is a known performance issue on Hexo. But all static website generator has such issue. We are trying to improve it. https://github.com/hexojs/hexo/pull/550
  2. Here are several tips:
    • Run hexo clean only when the style or the theme is changed which may affect all the pages. The old posts are stored in database, if there is no global change the generating process will skip them.
    • Use simple theme which does not contain many widgets.
    • Don't use categories or tags.
    • Remove unused Hexo plugins.
  3. So far, we haven't encountered such amount of posts to process. But the performance is really varied by different themes, the number of categories/tags, and renderer engines.
mnlbox commented 7 years ago

@NoahDragon Thanks for your reply. I create simple theme based on bootstrap without any widgets. It's extra simple and just use some property from app _config (I put some global config in app _config rather than theme config) I don't know is it right or not? I used many front-matter in my markdown page. (Category, Tags and two other custom attribute) Each post has between 2 or 10 tags and just one category My renderer engine is https://github.com/hexojs/hexo-renderer-marked

Can you make me some other suggestion based on this new information about my app?

rahil471 commented 7 years ago

I have been through the same problem. In the end, we decided to get rid of some hexo-plugins like. hexo-multiauthors, hexo-tag-generator, hexo-archives.
And it's not about the RAM, it has more to do with your CPU's. Hexo Generate is a CPU intensive task. @NoahDragon If we could utilize multi-cores(if it doesn't) then maybe we can reduce the pain a bit.

mnlbox commented 7 years ago

@rahil471 Yes I checked my system monitor and it seems Hexo only use 1 CPU core. I have 4 core and one of my core increased until 100% but others between 2% to 19%.

NoahDragon commented 7 years ago

@mnlbox It's okay to put all configuration into the app _config.yml file, the theme configuration will fall back to the app when it is not set. As the #550 states, the categories/tags may dramatically slow down the rendering process. So far, I don't have better suggests on that unless we improve the Hexo performance.

@rahil471 I believe @tommy351 has tried the multi-cores approach but I don't know why he didn't continue, maybe creating multiple Hexo rendering instances may increase the complexity and maintenance efforts. I think it is a good approach, and may re-think about it.

mnlbox commented 7 years ago

Maybe related: #2164

mnlbox commented 7 years ago

@NoahDragon I can build my site with Hugo in just 25 second. (with hexo my build not finished after 5 hour) 25 second for more than 10000 posts is awesome. What is the reason of this biiiiig difference?

NoahDragon commented 7 years ago

@mnlbox Thanks for the info. I will take a look into Hugo, I'm assuming it uses multiple processors and the performance between javascript and go also impacts.

leesei commented 7 years ago

@mnlbox did you tried the default theme without any plugin? (init a new site and copy your posts over) Inefficiency of theme/widget may be the culprit.

See discussion starting from here: https://github.com/hexojs/hexo/issues/1769#issuecomment-189313985

mnlbox commented 7 years ago

@leesei Yes I also try default theme and also try remove unused plugins but it's not different for this issue.

stevenjoezhang commented 1 year ago

I have tested a site with 2000 posts, and found the following code was executed over 10k times, taking up 20% of the execution time of hexo g

https://github.com/hexojs/hexo/blob/58a8f8c4147782f6d8719abb00cfd91323688ec1/lib/models/tag.js#L35-L43

One of the reasons is that queries (find in line 38 and 40) are O(n), thus the time consumption is terrible when dealing with a large number of posts

Update: with 8000 posts, list_tags is taking 47% of the execution time (21m 50s in total)

截屏2022-11-27 16 05 12

Update: I tried to disable external_link and optimize list_tags, the generating time reduced to ~5.25min

CC @hexojs/core

lorezyra commented 1 year ago

I have a site with over 1400 posts and almost 10K assets. It takes HEXO over 30 minutes to generate if I don't run hexo clean first. However, running hexo clean && hexo gen will generate the site within a minute. This tells me the issue resides with using the db.json file. That db is of no value to me as I only need my website generated and pushed. I don't need it after the site is generated.

SukkaW commented 1 year ago

One of the reasons is that queries (find in line 38 and 40) are O(n), thus the time consumption is terrible when dealing with a large number of posts

@stevenjoezhang

Tag.virtual().get() defines a getter, and the getter function here will be executed every time the property is accessed. So before we try to optimize the find, is it possible for Hexo to reduce the access to tag.posts with cache?

yoshinorin commented 7 months ago

I have a site with over 1400 posts and almost 10K assets. It takes HEXO over 30 minutes to generate if I don't run hexo clean first. However, running hexo clean && hexo gen will generate the site within a minute. This tells me the issue resides with using the db.json file. That db is of no value to me as I only need my website generated and pushed. I don't need it after the site is generated.

I've been deleting db.json before running hexo g and hexo s for quite some time. Today I took the opportunity to capture the framegraph without deleting db.json. (See https://github.com/hexojs/hexo/issues/5456#issuecomment-2053873863)

My environment has 1800 .md files and 1500 image files (jpg, png), and db.json size is 37MB.

As a result, it seems that toObject in the warehouse is taking time.

withdb json

SukkaW commented 7 months ago

As a result, it seems that toObject in the warehouse is taking time.

So the culprit is the cloneDeep.

But here is a thing: JSON doesn't support circular reference, hence why the warehouse uses cloneDeep. We can accomplish that by using a JSON-like format that supports circular reference.

stevenjoezhang commented 7 months ago

I have found some JSON libraries that handle circular references very well, such as flatter. However, switching the implementation of warehouse over to this would require a significant amount of manpower.

stevenjoezhang commented 7 months ago

In my own experiments, cloneDeep was not invoked. I'm curious under what circumstances / configurations this would be an issue?

yoshinorin commented 7 months ago

In my own experiments, cloneDeep was not invoked. I'm curious under what circumstances / configurations this would be an issue?

Below is my environment. I didn't include _config.yml in this comment. Because it's lengthy. Is it necessary? If you need any additional information, please let me know. I'll provide it if possible.

Hexo, Node version

$ hexo -v
hexo: 7.1.1
hexo-cli: 4.3.1
os: win32 10.0.22631
node: 20.11.1
...
v8: 11.3.244.8-node.17

Machine Info

# OS
Microsoft Windows [Version 10.0.22631.3447]

# Cpu
AMD Ryzen 7 PRO 4750G with Radeon Graphics

# Memory
Capacity     Name             Tag
17179869184  Physical Memory  Physical Memory 1
17179869184  Physical Memory  Physical Memory 3

Dependencies

// package.json
"dependencies": {
  "hexo": "7.1.1",
  "hexo-filter-nofollow": "2.0.2",
  "hexo-generator-archive": "git+https://github.com/hexojs/hexo-generator-archive.git#master",
  "hexo-generator-category": "git+https://github.com/hexojs/hexo-generator-category.git#master",
  "hexo-generator-feed": "git+https://github.com/yoshinorin/_hexo-generator-feed.git#master",
  "hexo-generator-index": "git+https://github.com/hexojs/hexo-generator-index.git#master",
  "hexo-generator-sitemap": "git+https://github.com/yoshinorin/_hexo-generator-sitemap.git#master",
  "hexo-generator-tag": "git+https://github.com/hexojs/hexo-generator-tag.git#master",
  "hexo-html-minifier": "git+https://github.com/hexojs/hexo-html-minifier.git#master",
  "hexo-pagination": "git+https://github.com/yoshinorin/hexo-pagination.git#my-site",
  "hexo-renderer-ejs": "git+https://github.com/hexojs/hexo-renderer-ejs.git#master",
  "hexo-renderer-markdown-it": "git+https://github.com/hexojs/hexo-renderer-markdown-it#master",
  "hexo-server": "git+https://github.com/hexojs/hexo-server.git#master"
}

Theme

I'm using a theme that I've delete many features from https://github.com/LouisBarranqueiro/hexo-theme-tranquilpeak.

db.json size

$ dir

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----        2024/04/16     20:21       37507929 db.json

Number of posts, assets etc

Please see Appendix section how to get these.

Number of posts: 1773
Number of post assets: 1784
Avg of post content length: 3645

Number of pages: 23
Number of page assets: 81
Avg of page content length: 4217

Number of tags: 246
Number of categories: 170
Number of routes: 5335

Appendix 1. (How to get number of x)

const Hexo = require('hexo');
const hexo = new Hexo(process.cwd(), {silent: false});

hexo.init().then(() => {
  hexo.load().then(() => {

    const posts = hexo.locals.get('posts').toArray();
    const postAsset = hexo.model('PostAsset');
    let numOfPostAssets = 0;
    let postContentTotalLen = 0;
    for(let post of posts) {
      const dir = post.path.slice(0, post.path.lastIndexOf("/"));
      const assets = postAsset.filter(x => x._id.includes(dir));
      numOfPostAssets = numOfPostAssets + assets.length;
      postContentTotalLen = postContentTotalLen + post.content.length;
    }

    const pages = hexo.locals.get('pages').toArray();
    const pageAsset = hexo.model('Asset');
    let numOfPageAssets = 0;
    let pageContentTotalLen = 0;
    for(let page of pages) {
      const dir = page.path.slice(0, page.path.lastIndexOf("/"));
      const assets = pageAsset.filter(x => x._id.includes(dir));
      numOfPageAssets = numOfPageAssets + assets.length;
      pageContentTotalLen = pageContentTotalLen + page.content.length;
    }

    const tags = hexo.locals.get('tags').toArray();
    const categories = hexo.locals.get('categories').toArray();
    const routes = hexo.route.list();

    console.log(`Number of posts: ${posts.length}`);
    console.log(`Number of post assets: ${numOfPostAssets}`);
    console.log(`Avg of post content length: ${Math.floor(postContentTotalLen / posts.length)}`);

    console.log(`Number of pages: ${pages.length}`);
    console.log(`Number of page assets: ${numOfPageAssets}`);
    console.log(`Avg of page content length: ${Math.floor(pageContentTotalLen / pages.length)}`);

    console.log(`Number of tags: ${tags.length}`);
    console.log(`Number of categories: ${categories.length}`);
    console.log(`Number of routes: ${routes.length}`);
  });
});

Appendix 2. (How to get framegraph)

$ 0x -D framegraph\\with-dbjson .\\node_modules\\hexo\\bin\\hexo g
yoshinorin commented 7 months ago

In my own experiments, cloneDeep was not invoked. I'm curious under what circumstances / configurations this would be an issue?

caused by toArray(). This function exec when post_asset_folder option is enabled. https://github.com/hexojs/hexo/blob/90b107c8b0bd8b386f64463403391f67eb1b2535/lib/plugins/processor/post.ts#L282