infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
https://ragflow.io
Apache License 2.0
10.87k stars 1.04k forks source link

[Feature Request]: File manager & API #345

Open tvvignesh opened 2 months ago

tvvignesh commented 2 months ago

Is there an existing issue for the same feature request?

Is your feature request related to a problem?

Hi. Thanks a lot for this project. While I have got RAGFlow working, I was wondering whether you have any documentation on things like

- Connecting with our existing authentication system: How do we do a handshake if a user is already signed up and logged in to our app and we want to avoid 2 signups/logins
- Embedding the assistant as a chatbot. How do we embed the chatbot in our web app?
- Also, we don't want to give users the ability to create assistant or add to knowledgebase but just chat with the knowledgebase. This means, we need to default the knowledgebase we want to use and the assistant we want to use and the admin manages both the knowledgebase and assistants. May I know how do we do this?

Describe the feature you'd like

An easy way to embed my assistant as a chatbot in the website and enable users to chat through the knowledgebase while maintaining history and context individually.

Describe implementation you've considered

No response

Documentation, adoption, use case

No response

Additional information

No response

tvvignesh commented 2 months ago

Also, what is the reason for having this? Since we are self-hosting ragflow, why is a 128 document limit required? Is it intended for some reason? Getting this errors when I go above 128 documents in my knowledgebase.

image

tvvignesh commented 2 months ago

Also, the toast on doing bulk operations come multiple times - once for each item in the UI - eg. enabling multiple files, running multiple files, etc.

image

Also, what is the difference between "Enable" and "Run" in the UI? I understand that you do parsing of the docs when I press "Run" but what does "Enable" or "Disable" do?

tvvignesh commented 2 months ago

In addition, is there a way to have the knowledgebase organized by folder structure instead of having it flat to help us in managing it easily?

For eg. we have docs organized like this and we would like to maintain the structure in ragflow as well - as of now, we are doing prefix and storing in flat structure. Having similar structure would help.

image

image

To make it simpler for you, you can probably support adding tags during upload and also support filtering of files by tag along with ability to view the list of tags

tvvignesh commented 2 months ago

In addition, when pressing "Next" during bulk upload of files, while it uploads all the files, it stays in the same screen as if the files are not uploaded yet giving no indication.

image

tvvignesh commented 2 months ago

Also, I see an "Upgrade" button in the Team section. Is this meant to be a paid feature of ragflow?

image

tvvignesh commented 2 months ago

When the file names are long, there is an overlap in Name and Chunk Number

image

tvvignesh commented 2 months ago

When I try to register the second user, it says "No such file or directory" in the response and I am not able to register

image

And when I try to login, it says email is not registered.

UPDATE: It worked after I setup the PEM file. For some reason the private.pem and public.pem file were not cloned and I kept getting these errors

image

tvvignesh commented 2 months ago

If I create a new user, I am not able to see the assistants created by other users. In my case, since I dont want to allow other users to create assistants but just chat with them, how do i do it?

tvvignesh commented 2 months ago

Also, you are assigning tenant_id as user id during queries. So, even if we change the tenant id of a user in the database level, we need to change the code as well everywhere to make sure it works. Rather I assume you should query the tenant of the current user and not the current user id itself.

image

tvvignesh commented 2 months ago

Another major issue I noticed is that, according to your logic, the assistant is nothing but the dialog. But that will not work if we use the same assistant across multiple users cause the conversation history of other users will also get shared since conversations are mapped to assistants (dialogs).

Rather you should have 3 layers if I am not wrong - Assistants > Dialogs > Conversations and dialogs are mapped to users

KevinHuSh commented 2 months ago

Is there an existing issue for the same feature request?

  • [x] I have checked the existing issues.

Is your feature request related to a problem?

Hi. Thanks a lot for this project. While I have got RAGFlow working, I was wondering whether you have any documentation on things like

- Connecting with our existing authentication system: How do we do a handshake if a user is already signed up and logged in to our app and we want to avoid 2 signups/logins
- Embedding the assistant as a chatbot. How do we embed the chatbot in our web app?
- Also, we don't want to give users the ability to create assistant or add to knowledgebase but just chat with the knowledgebase. This means, we need to default the knowledgebase we want to use and the assistant we want to use and the admin manages both the knowledgebase and assistants. May I know how do we do this?

Describe the feature you'd like

An easy way to embed my assistant as a chatbot in the website and enable users to chat through the knowledgebase while maintaining history and context individually.

Describe implementation you've considered

No response

Documentation, adoption, use case

No response

Additional information

No response

Thanks for all these issues which are very important.

The above three will be fullfilled in the 'Conversation API' feature we're about to release in the comming week. The system will produce multiple secrete tokens for a user, then, the other systems can use these tokens integerating with other own authentication.

KevinHuSh commented 2 months ago

Also, what is the reason for having this? Since we are self-hosting ragflow, why is a 128 document limit required? Is it intended for some reason? Getting this errors when I go above 128 documents in my knowledgebase.

image

Since the source code is consistent with our demo site code, inorder to save our computation resources, we add this limitation. We're gona fix this by passing it from evirement variable.

KevinHuSh commented 2 months ago

Also, the toast on doing bulk operations come multiple times - once for each item in the UI - eg. enabling multiple files, running multiple files, etc.

image

Also, what is the difference between "Enable" and "Run" in the UI? I understand that you do parsing of the docs when I press "Run" but what does "Enable" or "Disable" do?

The multiple toasts will be optimized. Since uploading and parsing is very time-consuming, it's very annoying if user what to test the existence of a specific file. And it happends a lot in the scenario of optimizing the quality of chat bot.

KevinHuSh commented 2 months ago

In addition, is there a way to have the knowledgebase organized by folder structure instead of having it flat to help us in managing it easily?

For eg. we have docs organized like this and we would like to maintain the structure in ragflow as well - as of now, we are doing prefix and storing in flat structure. Having similar structure would help.

image

image

To make it simpler for you, you can probably support adding tags during upload and also support filtering of files by tag along with ability to view the list of tags

It's already in our road map.

KevinHuSh commented 2 months ago

Also, I see an "Upgrade" button in the Team section. Is this meant to be a paid feature of ragflow?

image

Yes, and it's still under developing.

KevinHuSh commented 2 months ago

the PEM file

The PEM files are for HTTPS. We did't open-souce it.

KevinHuSh commented 2 months ago

Also, you are assigning tenant_id as user id during queries. So, even if we change the tenant id of a user in the database level, we need to change the code as well everywhere to make sure it works. Rather I assume you should query the tenant of the current user and not the current user id itself.

image

Good point.