infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
https://ragflow.io
Apache License 2.0
18.38k stars 1.86k forks source link

[Bug]: XSS on the chunkview #611

Open onixldlc opened 5 months ago

onixldlc commented 5 months ago

Is there an existing issue for the same bug?

Branch name

main

Commit ID

4c14760

Other environment information

running on ubuntu vm

Actual behavior

There is an xss in the Knowledge Base > Dataset > chunk view

image

Expected behavior

text in side of a chunk should be escape to mitigate xss

Steps to reproduce

1. download this .md file https://raw.githubusercontent.com/daffainfo/AllAboutBugBounty/master/Web%20Cache%20Poisoning.md

2. upload the md file

3. run the embedding process until it finish

4. then open `Knowledge Base tab > Dataset > chunk`

Additional information

i can only replicate the xss in the Knowledge Base > Dataset > chunk view page, i tried to get the ai to send that same chunk in the chat view and it escape it correctly

image image

as you can see on the text that i have highlighted, they has been escaped in the chat page but not in the chunk view page

but weirdly enough this doesn't happen in the actual xss file chunk

image

i guess it has something to do with the iframe tag sanitization ?

KevinHuSh commented 5 months ago

In chunk view, it just display plain text without any text rendering。 In chat veiw, it can be rendered with markdown format. And in chat view, the text has been processed by LLM necessarily the same as the content in chunks.

onixldlc commented 5 months ago

I see, so there was no sanitization to begin with, weirdly enough most of the xss payload didn't run. interesting

also after testing a bit more i got the xss to show up in the chat view via reference window, so I'm guessing both reference window and the chunk view is connected ? or at least use the same modal

2024-04-30_17-49

and this was the cause of the xss

2024-04-30_07-59