langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
https://dify.ai
Other
45.32k stars 6.37k forks source link

Parsing markdown files can result in the loss of some code blocks #6573

Closed ideagay closed 6 days ago

ideagay commented 1 month ago

Self Checks

Dify version

0.6.9

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

a.md

  1. import a.md above
  2. auto split or custom split

then you can look the bug,somecode in Paragraphs 001 lose some code,but the Paragraphs 002 is unbroken

image

✔️ Expected Behavior

no code lose

❌ Actual Behavior

some code block was lost

dosubot[bot] commented 1 month ago

@ideagay 你好! 我是一个可以帮助你解决问题、回答疑问,并成为贡献者的机器人。欢迎来到这里,我会在等待人类维护者的过程中为你提供帮助。🤖

To prevent the loss of code blocks when parsing markdown files in Dify version 0.6.9, you can use the react-markdown library along with react-syntax-highlighter for rendering code blocks. These dependencies are already included in your package.json file.

Here is an example of how you can set this up:

  1. Install the necessary packages if they are not already installed:

    npm install react-markdown react-syntax-highlighter
  2. Use the following code to render markdown content with syntax highlighting for code blocks:

    import React from 'react';
    import ReactMarkdown from 'react-markdown';
    import { Prism as SyntaxHighlighter } from 'react-syntax-highlighter';
    import { dark } from 'react-syntax-highlighter/dist/esm/styles/prism';
    
    const MarkdownRenderer = ({ content }) => {
      return (
        <ReactMarkdown
          children={content}
          components={{
            code({ node, inline, className, children, ...props }) {
              const match = /language-(\w+)/.exec(className || '');
              return !inline && match ? (
                <SyntaxHighlighter
                  style={dark}
                  language={match[1]}
                  PreTag="div"
                  {...props}
                >
                  {String(children).replace(/\n$/, '')}
                </SyntaxHighlighter>
              ) : (
                <code className={className} {...props}>
                  {children}
                </code>
              );
            }
          }}
        />
      );
    };
    
    export default MarkdownRenderer;

This setup ensures that code blocks in your markdown files are properly parsed and rendered with syntax highlighting, preventing any loss of code blocks [1].

Additionally, you can adjust specific configurations in the Dify platform to improve markdown parsing and prevent code block loss. These configurations are part of the segmentation and cleaning settings in custom mode:

Segmentation rules:

Preprocessing rules:

By configuring these settings, you can ensure that the markdown content, including code blocks, is properly segmented and cleaned, thus preventing code block loss [2].

To continue talking to Dosu, mention @dosu.