web-infra-dev / rspress

🦀💨 A fast Rspack-based static site generator.
https://rspress.dev
MIT License
1.36k stars 120 forks source link

feat: automatically import highlight languages #1050

Closed shulaoda closed 4 months ago

shulaoda commented 5 months ago

Summary

At the beginning, I did automatic import at the loader level by writing a plugin for mdx-js to ensure that both modes are applicable. But this will significantly increase the build size, as each static page will import the highlight package it needs.

Finally, I chose to implement this feature at the plugin level, which will import a common virtual package and can reduce the build-size.

Both methods have their own advantages and disadvantages, build-size and performance can’t be balanced.

Related Issue

1043

Checklist

CLAassistant commented 5 months ago

CLA assistant check
All committers have signed the CLA.

netlify[bot] commented 5 months ago

Deploy Preview for aquamarine-blini-95325f ready!

Name Link
Latest commit 5506490d3c8a84f496ebaad3b3df814bd5ffe69b
Latest deploy log https://app.netlify.com/sites/aquamarine-blini-95325f/deploys/663d1a5f6bfd8c000881b7fd
Deploy Preview https://deploy-preview-1050--aquamarine-blini-95325f.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

Lighthouse
1 paths audited
Performance: 83 (🟢 up 2 from production)
Accessibility: 97 (no change from production)
Best Practices: 100 (no change from production)
SEO: 92 (no change from production)
PWA: -
View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify site configuration.

shulaoda commented 5 months ago

We need to wait for the release of mdx-rs.

SoonIter commented 5 months ago

We need to wait for the release of mdx-rs.

done in https://github.com/web-infra-dev/mdx-rs/pull/40, you can try it

:P By the way, the maintainers are on holiday, and this pull request might be merged a little bit later.

Timeless0911 commented 4 months ago

In order to avoid breaking changes to user configurations, the string type should still be supported in configuration.

This situation can be explained in the document. And we can not process array members of the string type in the code logic.

shulaoda commented 4 months ago

In order to avoid breaking changes to user configurations, the string type should still be supported in configuration.

This situation can be explained in the document. And we can not process array members of the string type in the code logic.

The string type in the configuration is still supported, but it will be ignored. Do you mean we should emphasize this point in the document?

Timeless0911 commented 4 months ago

The string type in the configuration is still supported, but it will be ignored. Do you mean we should emphasize this point in the document?

You can give a deprecated warning console in terminal when user config string to let user know that they don't need to manually config prism language, just config alias.

Timeless0911 commented 4 months ago

From the perspective of avoiding breaking changes for users, the semantics and default values of the original markdown.highlightLanguages should remain as unchanged as possible. We need to ensure that in mdx-js or mdx-rs in partial files or users use @rspress/ plugin-shiki, this configuration item will not produce breaking changes.

Automatically detecting languages should be a matter of best effort. What we need to do is to scan the language syntax used in the code block, and then combine it with markdown.highlightLanguages and finally merge it to produce a copy of [name, alias] two-dimensional array, based on which the corresponding runtimeModule is generated.

The ideal way to change this area is to keep the original implementation unchanged and add the automatic detection logic in packages/core/src/node/runtimeModule/prismLanguages.ts, in conjunction with the user-configured markdown.highlightLanguages, generate a final configuration and generate virtual modules based on this final configuration. Users can still append or repeatedly declare the languages that need to be introduced through the markdown.highlightLanguages configuration (this is useful in mdx-js and when users enable mdx-rs for partial files), and users can also configure aliases.

In other words, among the languages we automatically detect, if it is a language supported by prism such as markdown, it is equivalent to appending 'markdown' to the user configuration array. If it is md such an alias, we can combine it with the user's configuration to create a merge. If the user does not configure an alias, it will be ignored.

从避免对用户产生 breaking changes 的角度来看,原有的 markdown.highlightLanguages 的语义和默认值应尽可能不变,我们需要保证在 mdx-js 或者部分文件使用 mdx-rs 以及用户使用 @rspress/plugin-shiki,该配置项不会产生破坏性的变更。

自动探测语言应该是一件尽力而为的事情,我们需要做的是扫描一遍代码块中用到的语言 syntax,然后和 markdown.highlightLanguages 结合最后 merge 出一份 [name, alias] 的二维数组,根据这个数组去生成对应的 runtimeModule。

这块比较理想的改动方式应该是,原有的实现不变,增加自动探测的逻辑在 packages/core/src/node/runtimeModule/prismLanguages.ts 中,配合用户配置的 markdown.highlightLanguages,生成一份最终配置,根据这个最终配置去生成虚拟模块。用户仍然可以通过 markdown.highlightLanguages 配置去追加或者重复声明需要引入的语言(这在 mdx-js 和 用户对部分文件开启 mdx-rs 时很有用),也可以配置别名。

换句话说,在我们自动探测出来的语言中,如果是 markdown 这样的 prism 本支持的语言,相当于在用户配置数组中追加一个 'markdown',如果是 md 这样的别名,则可以结合用户的配置做一个 merge,如果用户没有配置别名则进行忽略。

shulaoda commented 4 months ago

@Timeless0911

Firstly, the distinction between mdx-js or mdx-rs in partial files is only made at the loader level, while the plugin level is like 'virtual-site-data', where all mdx-rs are used to achieve fast scanning effects. As for why it is processed in 'virtual-site-data', it is to reduce unnecessary duplicate scanning. Secondly, @rspress/plugin-shiki is a plugin for mdx-js at the loader level, which replaces elements in the HTMLTree of mdx-js. It does not use markdown.highlightLanguages, but only DEFAULT_HIGHLIGHT_LANGUAGES. Perhaps we should not remove the default language from it.

首先,mdx-js 或者部分文件使用 mdx-rs 的模式仅仅是在 loader 层面进行了区分,plugin 层面就像 'virtual-site-data' 一样,是全部使用 mdx-rs 以达到快速扫描的效果。至于为什么放在 'virtual-site-data' 里面处理,是为了减少不必要的重复扫描。 其次,@rspress/plugin-shikiloader 层面 mdx-js 的插件,它会对 mdx-jsHTMLTree 进行元素替换,并没有使用到 markdown.highlightLanguages,而是仅仅使用了 DEFAULT_HIGHLIGHT_LANGUAGES,或许我们不应该删除其中的默认语言。

Timeless0911 commented 4 months ago

哦哦,我了解到你的想法了,那目前的还有一些小点用中文表达下:

  1. 现在的 extractPageData 方法传入了一个 highlighterLangs 参数,可以考虑直接把 mdx-rs compile 后的 languages作为返回值 return 出来,和 toc 的处理一样
  2. @rspress/plugin-shiki 目前依赖的默认值可以换一个常量直接写在 shiki 插件里进行维护,把 https://github.com/web-infra-dev/rspress/blob/5abf04c4832288456f586e8c264ff2703e86ca84/packages/plugin-shiki/src/shiki/pluginShiki.ts#L59-L61 这个的结果存储为一个常量
  3. virtual-site-data 里处理 Highlight 的相关代码有接近 50 行,可以单独抽出一个函数来
  4. 尽量避免如 fs 修改为 fsExtra,调换 if else 顺序等改动,这让 code review 变更的时候看起来有点混乱
  5. @SoonIter 在 #1056 中增加了一些测试,你可以 rebase 下代码后修改补充下测试用例
shulaoda commented 4 months ago
  1. 现在的 extractPageData 方法传入了一个 highlighterLangs 参数,可以考虑直接把 mdx-rs compile 后的 languages作为返回值 return 出来,和 toc 的处理一样

这是由于 extractPageData 的返回值是一个 Promise.all 的 rawSiteData 结果,使得我不得不这么做,我会尝试采取你的建议。

  1. 尽量避免如 fs 修改为 fsExtra,调换 if else 顺序等改动,这让 code review 变更的时候看起来有点混乱

尝试在 loader 层面实现功能时,做了很多 code review 的工作,忍不住做了一些修改,我尽量还原一些不必要的代码。

shulaoda commented 4 months ago

@SoonIter Can you help me do some unit tests for RuntimeModuleID.PrismLanguages? 🥺

SoonIter commented 4 months ago

@SoonIter Can you help me do some unit tests for RuntimeModuleID.PrismLanguages? 🥺

I trust you. We can let @Timeless0911 's salary gone by every bug together(

Timeless0911 commented 4 months ago

is this pr ready?

shulaoda commented 4 months ago

is this pr ready?

Yes, I think it's ready. If there are any problems, I will take responsibility to the end. 🏃‍♂️💨

Timeless0911 commented 4 months ago

!canary