hexojs / hexo

A fast, simple & powerful blog framework, powered by Node.js.
https://hexo.io
MIT License
39.61k stars 4.87k forks source link

duplicates of `site.data` in db.json leads to large size #5260

Closed EmptyDreams closed 1 year ago

EmptyDreams commented 1 year ago

Check List

Please check followings before submitting a new issue.

Expected behavior

The db.json file should be a reasonable size. (Should it be under 50MB at the very least?)

Actual behavior

But in fact, even though I deleted most of the posts from my blog and kept only one post (the md file was 5KB in size), the resulting db.json was still over 70MB.

I'm not quite clear on why this happened, I didn't have this problem originally either, it seemed to come out of nowhere but I can't remember what I was modifying at the time.

Through my observation, there are a lot of duplicate _id & data fields in db.json, such as the source/_data/avatar/cpen.webp file, which is recorded more than 100 times in db.json, which takes up a lot of space.

This problem caused my hexo se and hexo g to wait quite a long time after the INFO Validating config step, and I had to run hexo cl or manually delete the db.json before each command, which clearly defeats the purpose of the db.json file.

Is the problem still there under "Safe mode"?

The problem persists even if I add the --safe directive.

Environment & Settings

Node.js & npm version(node -v && npm -v)

node version: 18.12.1

npm version: 9.8.1

Your site _config.yml (Optional)

Hexo and Plugin version(npm ls --depth 0)

Details

``` +-- @neilsustc/markdown-it-katex@1.0.0 +-- cheerio@1.0.0-rc.12 +-- gulp-clean@0.4.0 +-- gulp-cssnano@2.1.3 +-- gulp-html-minifier-terser@7.1.0 +-- gulp-htmlclean@2.7.22 +-- gulp-terser@2.1.0 +-- gulp@4.0.2 +-- hexo-abbrlink@2.2.1 +-- hexo-asset-image@1.0.0 +-- hexo-butterfly-envelope@1.0.15 +-- hexo-deployer-git@3.0.0 +-- hexo-filter-nofollow@2.0.2 +-- hexo-generator-archive@2.0.0 +-- hexo-generator-baidu-sitemap@0.1.9 +-- hexo-generator-category@2.0.0 +-- hexo-generator-feed@3.0.0 +-- hexo-generator-index@3.0.0 +-- hexo-generator-sitemap@3.0.1 +-- hexo-generator-tag@2.0.0 +-- hexo-graphviz@1.0.2 +-- hexo-log@3.2.0 +-- hexo-renderer-ejs@2.0.0 +-- hexo-renderer-markdown-it@7.1.0 +-- hexo-renderer-pug@3.0.0 +-- hexo-renderer-stylus@3.0.0 +-- hexo-server@3.0.0 +-- hexo-swpp@2.8.10 +-- hexo-wordcount@6.0.1 +-- hexo@6.3.0 +-- node-fetch@2.6.9 `-- prismjs@1.29.0 ```

Your package.json package.json

detail

```json { "name": "hexo-site", "version": "0.0.0", "private": true, "scripts": { "build": "hexo generate", "clean": "hexo clean", "deploy": "hexo deploy", "server": "hexo server" }, "hexo": { "version": "6.3.0" }, "dependencies": { "@neilsustc/markdown-it-katex": "^1.0.0", "cheerio": "^1.0.0-rc.12", "hexo": "^6.3.0", "hexo-abbrlink": "^2.2.1", "hexo-asset-image": "^1.0.0", "hexo-butterfly-envelope": "^1.0.15", "hexo-deployer-git": "^3.0.0", "hexo-filter-nofollow": "^2.0.2", "hexo-generator-archive": "^2.0.0", "hexo-generator-baidu-sitemap": "^0.1.9", "hexo-generator-category": "^2.0.0", "hexo-generator-feed": "^3.0.0", "hexo-generator-index": "^3.0.0", "hexo-generator-sitemap": "^3.0.1", "hexo-generator-tag": "^2.0.0", "hexo-graphviz": "^1.0.2", "hexo-log": "^3.0.0", "hexo-renderer-ejs": "^2.0.0", "hexo-renderer-markdown-it": "^7.1.0", "hexo-renderer-pug": "^3.0.0", "hexo-renderer-stylus": "^3.0.0", "hexo-server": "^3.0.0", "hexo-swpp": "^2.8.10", "hexo-wordcount": "^6.0.1", "node-fetch": "^2.6.9", "prismjs": "^1.29.0" }, "devDependencies": { "gulp": "^4.0.2", "gulp-clean": "^0.4.0", "gulp-cssnano": "^2.1.3", "gulp-html-minifier-terser": "^7.1.0", "gulp-htmlclean": "^2.7.22", "gulp-terser": "^2.1.0" } } ```

uiolee commented 1 year ago

~This could be a problem with one of your plugins~

EmptyDreams commented 1 year ago

This could be a problem with one of your plugins

But it still does after I add the --safe parameter, which I don't think should be the case if it's a plugin issue.

uiolee commented 1 year ago

Can you provide a reproducible example?

EmptyDreams commented 1 year ago

Can you provide a reproducible example?你能提供一个可重复的例子吗?

blog.zip

I'm sorry I'm just now responding to your question.

I've uploaded a zip file with examples that can successfully reproduce the problem I'm talking about. The source folder in this zip contains two folders that make db.json easily over 100MB in size, and whenever I create an empty blog post (using the hexo new post command to create it with nothing in the md file), db.json grows in size at a rate that I can see. I'm guessing that hexo incorrectly duplicates data in db.json, causing the size to balloon indefinitely.

EmptyDreams commented 1 year ago

My friend told me that placing binary files (including images) in the _data folder causes db.json to bloat, I tested this, at least for image files, and the problem disappeared when I moved all the images in _data outside. But I don't think hexo should have this problem, it should be considered a bug in hexo.

uiolee commented 1 year ago

reproduce

  1. put image (or other type) file(s) in _source/_data/
  2. run hexo g.

    Or run hexo g, then modify a post and rerun hexo g.

db.json

the data of _data will copied to site.data of each post (and page).

this may be the related code. (https://github.com/hexojs/hexo/blob/7b588e78aae57e756e4d18bcd78e63d9dc7d34cd/lib/plugins/filter/before_generate/render_post.ts#L9)

db.json

uiolee commented 1 year ago

I think you shouldn't put images or other binary files in Data Folders, but in Asset Folders.

Of course, we should probably process site.data in a more elegant way

stevenjoezhang commented 1 year ago

It was added in https://github.com/hexojs/hexo/issues/1969

uiolee commented 1 year ago

Fixed by https://github.com/hexojs/hexo/pull/5325