vuejs / vitepress

Vite & Vue powered static site generator.
https://vitepress.dev
MIT License
11.48k stars 1.86k forks source link

Why does the html obtained in buildEnd contain ​ #3364

Closed rxliuli closed 4 months ago

rxliuli commented 4 months ago

Describe the bug

I am trying to generate rss for the website added by vitepress, but I found that the html obtained by getting ContentData['html'] in buildEnd contains ​. I want to confirm that this is a mistake or it can be designed. .

image

Reproduction

https://stackblitz.com/edit/vitepress-rss-generate?file=docs%2F.vitepress%2Fconfig.ts&file=docs%2F.vitepress%2Fdist%2Findex.html

Expected behavior

The html obtained in buildEnd does not contain ​ the same as the final output html.

System Info

System:
    OS: Linux 5.0 undefined
    CPU: (8) x64 Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
    Memory: 0 Bytes / 0 Bytes
    Shell: 1.0 - /bin/jsh
  Binaries:
    Node: 18.18.0 - /usr/local/bin/node
    Yarn: 1.22.19 - /usr/local/bin/yarn
    npm: 9.4.2 - /usr/local/bin/npm
    pnpm: 8.10.5 - /usr/local/bin/pnpm
  npmPackages:
    vitepress: latest => 1.0.0-rc.32

Additional context

I also confirmed that the RSS of vue’s official blog has this problem when displayed on inoreader and feedly.

image

Validations

brc-dd commented 4 months ago

That's an HTML entity. Use some parser library to convert that to unicode sequences (like decode method of https://www.npmjs.com/package/html-entities and maybe chain the result with a .replace(/[\u0000-\u001F\u007F-\u009F\u061C\u200E\u200F\u202A-\u202E\u2066-\u2069]/g, "");).

rxliuli commented 4 months ago

@brc-dd Of course I could have deleted them anyway, just wanted to make sure it was something viteprees expected or if it was a bug.

brc-dd commented 4 months ago

It's expected behavior. We need something there to pass a11y tests.

rxliuli commented 4 months ago

@brc-dd By the way, when generating rss, if it contains pictures, the image link in html obtained in buildEnd is not the final link, such as cover.A4Q5uAxl.jpg

image

Is there a solution to this problem? Maybe I need to scan the dist to get the final html after actually writing the file?

brc-dd commented 4 months ago

Can you elaborate?

rxliuli commented 4 months ago

Can you elaborate?

updated ⬆️

brc-dd commented 4 months ago

Ah weird. This should be the final link in buildEnd. I'll take a look.

brc-dd commented 4 months ago

Ah no, you're using createContentLoader. It doesn't return SSR'd HTML. You need to create a list and store data from transformHtml and generate the feed from that in buildEnd. It should be something like this - https://github.com/vuejs/vitepress/issues/520#issuecomment-1566062351 (first argument of transformHtml is the rendered HTML)

rxliuli commented 4 months ago

Ah no, you're using createContentLoader. It doesn't return SSR'd HTML. You need to create a list and store data from transformHtml and generate the feed from that in buildEnd. It should be something like this - #520 (comment) (first argument of transformHtml 是渲染的 HTML)

Thank you, I solved it. In the end, I divided the html into those with pictures and without pictures. If there were pictures, I used node-html-parser to re-parse. Otherwise, I used the html in ContentData directly. (most do not come with pictures)

brc-dd commented 4 months ago

Yeah that could work too. Or if you can, try to store images in the public directory. That way their path won't change.

rxliuli commented 4 months ago

Yeah that could work too. Or if you can, try to store images in the public directory. That way their path won't change.

Yes, I noticed that the vue official blog does this. But for my scenario, I need to execute multiple processes from local markdown source files, vitepress is just one of them (building the website), and I need the markdown file to be just a normal file reference.


By the way, I also submitted a PR for vue blog to fix the original problem of this issue. ref: https://github.com/vuejs/blog/pull/21

brc-dd commented 4 months ago

Ah I don’t have access to the blog repo. Someone else will get back to you on that PR.